Data is at the core of most enterprises today. Plenty has been written about the need to leverage insights and analytics to gain a competitive advantage as well as the mountains of data that can be collected, refined and analyzed in order to gain a competitive edge.
Whether that’s to price better, find that next consumer or the next novel drug that will cure a disease, organizations have set their sights on the promise of “big data” and IT is at the forefront. The role of the Data Base Administrator (DBA) and the Ops team that supports them is continuously evolving.
As developers and DBAs look for more efficient ways to get to the insights and deliver on the analytical applications their business stakeholders demand they now have a myriad of tools to choose from. Let’s take a look at some of them.
At its simplest form, databases are used to organize and improve access to text typically structured as rows and columns. Relational Database Management Systems (RDBMS) are the primary way enterprises organize their core data today and depending on the importance, convenience, security and speed requirements may choose different vendors to help them with it. For example, Oracle, MSSQL, MySQL, IBM DB2 or PostgreSQL.
The structured query language (SQL) is one of the most popular methods of analyzing this data to draw out the insights and the database scheme the DBAs choose has significant impact on the complexity of the queries or their performance.
In other words, get the scheme right and the time spent performance tuning or writing complex joins is significantly reduced.
In the 90s the notion of object oriented programing gained popularity and with it the concept that an object also includes its attributes. In this case developers and DBAs started to treat the data in the database as objects for example a house has a certain number of bedrooms, bathrooms, windows, floors, etc. And these attributes belong to a house instead of being extraneous data or individual fields that need to be joined.
Object Database Management System (ODBMS) is obviously an important design consideration and has significant implications for how the data is accessed and queried to get to the insights.
In comparison to RDBMS, ODBMS is useful when you have an object model for your development efforts (e.g. you’re building in C++ or Java), particularly if those objects are complex. For example, instead of the house I described above you are using CAD to design and compare a collection of machines with multiple parts.
On the other hand, RDBMS is more flexible, can be used with more development approaches. There is also a larger knowledge base and community to support you in your RDBMS efforts.
More recently NoSQL has gained popularity as a way to store and access data. It is fundamentally a non-relational data base approach. Data is organized for storage and retrieval in another format than tables. NoSQL enables a simpler and - many would argue - more scalable design that makes storing and accessing those mountains of data easier. It is also a great approach for storing and accessing different data types (e.g. rich media files, pictures, graphics, etc.). MongoDB, Redis, Cassandra and HBase are a few of the popular vendors here.
It is also worthwhile to note that MapReduce is one of the popular analytical approaches for drawing insights from NoSQL databased. In other words, while NoSQL is a mechanism for the read, write and storage MapReduce offers a way to query and analyze the information.
There is also an emerging category called NewSQL. NewSQL is purpose built for online transactions which improves the scalability of the fundamental read write operations while preserving the SQL language for analysis. ScaleBase and MemSQL are two examples here.
Why Ops should care about DB approaches?
So, as an Ops team, what should you do about all of this and do you even care? Well of course you do. The sooner you partner with your developer and DBA friends to understand the approach they are taking for storing and analyzing data the more you can help:
Vertical vs. horizontal scaling
In more traditional approaches (e.g. RDBMS) you may have virtualized your database but need to ensure the underling infrastructure is properly configured based on the workload demand. And, of course, you want to make sure you don’t over provision for a peak that only happens once a month (e.g. end of month close the books). The ability to size up or down in real-time as demand fluctuates is key.
There are also nuances of how memory is held and managed inside of the database (e.g. DBMem, Transaction Log). Understanding these relationships vs. what is presented to the hypervisor is critical. And the ability to tune database resource management with a full understanding and in relation to the underlying compute, storage and network resoueces is key.
The horizontal scaling side is more popular with new database approaches like NoSQL and MapReduce. There is again a key decision of when to scale and where to place new workloads. Whether those workloads are VMs or containers and you are placing them across an on-prem cluster or in a public cloud. The decision of when to spin up or down and where to place is critical to the performance and cost tradeoff.
Making that decision based on the real-time workload demand and matching this demand to available shared infrastructure will keep your developers, DBAs and end-users smiling.
Ultimately, you want to ensure you’re providing your counterparts with the best approach that assures performance without killing your budget. No matter how they’ve decided to design the scheme or application, the sooner you partner the better.