Don’t fall behind! Here’s the most important database news from the last few months.
The Revival of Great SQL Ideas
The introduction of SQL features into NoSQL systems is a recurring pattern these days. Apache Flink, for example, has just introduced the match_recognize clause. Couchbase also follows this path as it announced the introduction of SQL window function into the next version of their product. However, SQL is more than a query language with powerful features. Constraints, transactions and transaction isolation are also integral parts of SQL. In this issue, I’d like to focus on how NoSQL systems try to achieve the consistency guarantees that SQL databases provide.
To give you some background, I present you with two quotes from the article “Thank You for Your Help NoSQL, but We Got It from Here” that Rick Negrin of MemSQL published in July:
Eventually, the large scale cloud services (Google, Facebook, Yahoo, Microsoft, and others) all built their own custom systems to handle scale demand.
In this sentence, there is a very important message that is easy to miss: these companies build tailor-made systems for their very specific needs. If you have the resources to re-implement a database for your very specific needs, you will clearly build one that is better at meeting those needs than a generic off-the-shelf product…and of course you will leave out all the functionality you don’t need.
Then, those companies started to talk about their solutions.
Eventually open source systems started popping up using those same ideas, and the NoSQL movement was born.
You might know the rest of the story: the NoSQL vendors pushed their solutions as the future of databases. Tech people tend to love everything that is new, especially when it is backed by a convincing buzzword like NoSQL.
Unfortunately, many NoSQL users have missed the point: each of those systems was meant to solve one very specific problem, and only that problem. NoSQL systems are not generic databases that happen to outperform the competitors for one use case. Instead, they are “one-trick ponies” that can do only one thing. Even though they can do that one trick very well, they are not very helpful for other things.
Back to the year 2018. Today we can observe that the NoSQL vendors are teaching more tricks to their ponies—mostly tricks SQL databases have been able to do for ages. The introduction of multi-document transactions to MongoDB 4.0—even though it is rather limited—is probably the best example from the past few months. In general, NoSQL systems have had a hard time regarding consistency. The current trend is that even distributed databases are taking strict consistency more seriously again.
Google Spanner is such a system that values consistency over availability. Of course, Google started to talk about Spanner and eventually (or inevitably?) open source systems started popping up using those same ideas. Déjà vu? Well, it’s basically what I quoted above. Once again, history repeats itself.
So, what’s going wrong this time? This is the question that Daniel Abadi explores in his article “NewSQL database systems are failing to guarantee consistency, and I blame Spanner”. The article gives a brief introduction into why strong consistency is desirable, why NoSQL systems gave up on it in the first place, and how some more recent systems strive for strong consistency in a distributed environment. The bottom line is that systems that try to re-implement the ideas from Google Spanner cannot give the same guarantees because Spanner uses highly precise time sources such as atomic clocks to achieve consistency. If you use the Spanner ideas without such a time source, you cannot reach the same level of consistency and availability that Spanner does. According to Abadi, that is the mistake some systems make.
A lot has happened since SQL-92. My week-long training is the update for developers. Check it out now.
Once more the question arises: which database to use, especially when you consider that there is an agreement that strong consistency is desirable? I’d say just do it the Amazon way, and use an old-fashioned SQL database.
Yes, Amazon still uses Oracle Databases. Even though their Dynamo paper was one of the reasons NoSQL became a thing, and after they introduced several other database services like Redshift, Aurora, Neptune, and just in November Timestream and Quantum Ledger.
Nevertheless Amazon uses Oracle Database to run their business. Even though they don’t want to. In August they announced their plans to get rid of Oracle Databases by 2020, which “will be hard” (Larry Ellison, a week later). On Prime Day the migration has already caused an outage in one of its biggest warehouses, but the migration makes progress nevertheless and was the confirmed target as of late November. It seems like it takes more than a decade to build enough one-trick ponies to replace a traditional SQL databases. In the end, they will still use SQL—just not the Oracle Database.
The lesson we can learn from this story is the following: start with a generic database. In retrospect, this is what most successful companies did. SQL database are a good choice because they can do many tricks. You will need those tricks, especially when your requirements are volatile.
When you eventually get to the point where the database reaches its limits, if that ever even happens, optimize first (seek professional help!), then identify the single issue that causes the biggest pain. By now, chances are good there is a ready-to-use one-trick pony that can handle that pain point better than the generic database. Offload this pain point to the one-trick pony (seek more professional help!). If needed, repeat. Don’t give the flexibility and consistency of SQL up until you absolutely have to.
The modern and successful architecture that is commonly used today is to have an SQL database that is sometimes surrounded by some one-trick ponies to take care of a few pain points. The contribution the NoSQL movement has made to this architecture is that you don’t need to build the one-trick ponies by yourself like the first movers had to.
Cloud Offering: Trade Your Hardware for Security Updates
Like in any other business, cloud providers have creative ways to convince potential customers to use their services. I found the one from Microsoft in July particularly funny. As the end-of-life for the releases 2008 and 2008R2 of Windows and SQL Server approaches, Microsoft offers three more years of free security patches if those systems are migrated to the Azure cloud. Instead of paying for new licenses, you can choose to pay for “rented hardware” in the Azure cloud. Isn’t that tempting?
Technology and Science
Just in Time (JIT) compilation is currently a hot topic for SQL databases. The first major database that introduced this technology was Microsoft SQL Server 2014 (aka Hekaton). But Oracle also has a big bet running on its Java JIT implementation Graal, and the latest PostgreSQL release introduced JIT compilation into a free open-source database.
Although JIT compilation can improve performance considerably, vendors have to put some additional effort into maintenance in order to offer it. That’s why two scientists from the Netherlands tried to automate some parts of this process. They used a part from the SQLite source code—the Virtual Database Engine—for automatic JIT compilation.
Unrelated to that, yet another interesting paper about compiling queries was published in 2015: Functional Pearl: A SQL to C Compiler in 500 Lines of Code.
For those interested in the early days of SQL, here is an interesting read for you: The 1995 SQL Reunion: People, Projects, and Politics. These are the notes from a meeting of people involved with the development of SQL and its products in the 1970s and 1980s.
It covers a wide variety of topics, such as the fact that a day-long discussion took place to decide which keyword to use for the null value. But my favorite part is this:
Don Chamberlin: Larry called up. Larry’s company in those days was not called Oracle. [...]. He had heard about the System R prototype and he wanted to make sure that his product was fully compatible with it, right down to the error code values. We went and asked Frank, "Can we give our error codes to this guy Ellison and Frank said, "No - those are IBM Confidential."
Franco Putzolu: That was the only part that was confidential.
In the meanwhile, the SQL standard has come to define error codes—it is just that barely any vendors follow them.
New Major Releases
There have been some major releases in recent months:
- Oracle 18c (Feb, July, Oct)
Originally released in February, but only for users of the Oracle Cloud and Engineered Systems, the Oracle Database 18c eventually became available for download in July. The final 18c release followed in October in the form of the free Express Edition (XE). In the meanwhile, the closed beta for next year’s release (19c) has started.
Oracle Database 18c introduced many new JSON features, polymorphic table functions and a graph processing language (PGQL). My Article “What’s New in Oracle Database 18c” on modern-sql.com has a closer look at release 18c and reveals some interesting surprises that aren’t covered in the documentation.
- MariaDB 10.3 (Aug)
Just a year after MariaDB 10.2 introduced window functions and recursive queries, release 10.3 follows with another seven (7!) standard SQL features, system versioning being the most important of them. I’ve published my review of MariaDB 10.3 on modern-sql.com: “What’s New in MariaDB 10.3”
- SQLite 3.25.0 (Sep)
SQLite 3.25.0 introduces window functions. The implementation is pretty much on par with those of other databases. I’ll release a more detailed review on modern-sql.com soon (follow via Twitter, e-mail or RSS).
- PostgreSQL 11 (Oct)
PostgreSQL 11 was released in October. Among other features (such as JIT), it introduced extended window function with frame exclusion and the groups unit (besides rows and range). That makes PostgreSQL the first major SQL database to support these features! Again, follow modern-sql.com (Twitter, e-mail or RSS) to get my upcoming article about PostgreSQL 11.
Note that MySQL now follows an “agile” release cycle: minor release can also introduced new features. For example: Instant Add Column (8.0.12), function-based indexes (directly, in create index; 8.0.13).
Microsoft has also released a public preview of SQL Server 2019.
New Recordings and Slides
Some of my conference sessions were recorded and you can find them below. Also, have a look at my conference schedule to see when and where I’m speaking next.
From Twitter, Keeping it Brief (follow me on Twitter)
About Markus Winand
Markus Winand is an independent author, trainer and consultant on all things SQL. His book “SQL Performance Explained” has just been translated into a fifth language. It can be purchased as a print edition, and it’s also available to read for free on his use-the-index-luke.com website. Markus is currently working on his second book centered on the idea that things have changed a lot since SQL-92. It is likewise available free of charge online (modern-sql.com). Markus Winand is regularly speaking at international conferences and user groups.