Cloud native, chaos-tolerant FaunaDB adds support for SQL, GraphQL, and CQL

By George Anadiotis for Big on Data | April 9, 2019 -- 13:51 GMT (06:51 PDT) | Topic: Data Management

Kremlin leaves backdoor account in thousands of unprotected business databases "Admin@kremlin.ru" account spotted on thousands of Russian-linked, internet-exposed MongoDB databases.

Today's applications want it all, and the databases powering them are forced to follow: Auto-deployment and scalability both on-premise and in the cloud, multi-cloud, hybrid cloud, plus resilience, geo-distribution, and SQL. 

We've seen that the list of databases with all these is quite short. Today, however, another database is making that list: FaunaDB. FaunaDB was created by former Twitter employee No. 15 Evan Weaver to deal with the issues he has experienced first hand at Twitter. FaunaDB has been a NoSQL solution, but not anymore.

FaunaDB just switched camps overnight, becoming more interesting to a wider, diverse target audience, by adding support for GraphQL for web applications, as well as CQL for key-value access and SQL for relational workloads. ZDNet had a Q&A with Weaver to find out how and why this came to be, and what it means for FaunaDB and database users at large.

The new FaunaDB: FQL, SQL, GraphQL

The first thing we wondered about was what kind of SQL will be supported. Is it SQL-like, or plain old ANSI SQL? Weaver said that the target is a variant of ANSI SQL like other relational databases: 

"Like those databases there will likely be a few custom keywords and a few places where the specification is unclear or we need to diverge slightly. Like others in our space, we will begin with the basics and keep extending and expanding over time to meet customer use cases. We believe SQL is important to the enterprise, but applications are evolving, and so are developer skills."

sql-1.png

By adding SQL support, FaunaDB has turned from NoSQL to SQL overnight.

How much variance there is in that variant, and whether that works for them, users will have to see for themselves. In any case, this opens the door to a vastly wider audience, unwilling or unable to learn the specifics of FaunaDB's own query language, FQL, in order to use it. But that's not all -- the new FaunaDB comes with GraphQL and CQL, too, each serving a different purpose.

GraphQL is a way to streamline access to REST APIs, and has lots of traction. But it's not something we are used to seeing databases supporting natively. Weaver noted that there was an effort to ensure that GraphQL is embedded natively and compiles transparently to FQL and offers all the same guarantees: 

"With Fauna's approach to APIs, you can manipulate any underlying dataset via any API, and always preserve strong consistency, row-level access control, QoS, temporality, and the like, even if not part of the API query language itself. This is fundamental to our API philosophy and extremely important distinction.

We can do this because these APIs run against a common Calvin powered core. Other databases with multiple interfaces typically bolt on entirely new query executors and often new storage engines, and are multi-API in name only, leading to a host of integration problems no better than running several different databases side by side."

Weaver noted that the GraphQL community is on the bleeding edge of application development and expects everything to be global and serverless; now, he said, it has instant access to a native serverless GraphQL cloud via FaunaDB.

Multi-model?

If GraphQL is a nod to the future, and SQL is a nod to the past, what's the point of adding support for CQL? CQL is the query language used by Apache Cassandra and its commercial version, DataStax Enterprise (DSE), among the poster children of NoSQL. This was not something we were expecting to see as new feature in a competing database solution, especially announced at the same time as SQL.

But then again, FaunaDB is not the first to do this. Besides ScyllaDB, a drop in replacement for DSE, Azure CosmosDB also supports CQL. The rationale is obviously the same: Onboarding Cassandra/DSE users. Weaver said CQL is a good key/value interface, but primarily of interest to customers already running Cassandra: 

"They are tired of its operational nightmares and lack of data correctness. You can't run mission critical workloads on Cassandra, if you are stuck in that position, you are on the hunt for a true replacement. You can move your apps, and rewire them to FaunaDB Cloud or Enterprise with lower effort than adopting FaunaDB's native interface FQL directly." 

So, with FaunaDB now offering an array of query languages in addition to its own, what would Weaver expect users to primarily use going forward? Customers will use the API that best suits their applications, said Weaver, but he expects wide adoption for GraphQL. FaunaDB is also touted as a multi-model database, and that was something that we found a bit confusing. 

01-nosql-and-multi-model-database-24-638.jpg

Multi-model support is something more and more databases are offering, and FaunaDB claims mult-model support, too,

For example, FaunaDB says it supports graph and temporal, but what exactly is meant by this was not clear to us. GraphQL support does not imply graph. As for data temporality, it can be very useful, and it's not something many databases offer. But we don't really think of it as a model per se. Weaver said that they think of multi-API as distinct from multi-model: 

"FaunaDB is both multi-model and multi-API. FQL unifies relational, document, key/value, temporal and graph access to data. You can pick the models your application needs, and now, the best standard APIs for each model as well.

FaunaDB does let users define and install their own schemas via GraphQL in the database. You don't have to know anything about FQL to use GraphQL. It's not just a stepping stone to FQL, which is more like power user mode. If you like what you have with GraphQL you can stop there. We support a subset of the graph domain already, specifically graph storage and traversal. The missing features for graph are implementing a standard graph query language, and graph analytics, which are on our roadmap.

Temporality is about change data capture for high-value data. Use cases for temporality focus on audit logging, activity feeds, mobile sync, and the like. It is not time-series, which is about aggregating low value data over time. That is an OLAP use case that we do not currently support."

Chaos tolerance, Calvin, Spanner, and Jepsen

If you saw a reference to FaunaDB's "Calvin powered core" earlier, you may be wondering what that is, and why you should care. Since we're about to embark on a bit of an under the hood tour, we may as well add Spanner, Jepsen, and chaos tolerance to the mix. The tour will be short, and by the time it's over, you may have an idea of how these things translate to database use cases. Remember that short list of databases FaunaDB has just made? Some other entries in that list are Azure CosmosDB, Google Spanner, and a number of Spanner clones. Spanner the database is based on Spanner the protocol, while FaunaDB is based on a different protocol called Calvin. The aim of both Spanner and Calvin is to deliver external consistency, low latency global replication, and high availability.

As opposed to Spanner, there is just one implementation for Calvin: FaunaDB. So, the properties of Calvin are central as to what differentiates FaunaDB from other options, such as CockroachDB, for example. The difference, according to Weaver (who also cites Daniel Abadi, the inventor of Calvin), is in the chaos tolerant nature of Calvin: 

"We chose Calvin because it is optimal for the cloud. Calvin delivers external consistency, low latency global replication, and high availability and chaos tolerance without depending on wall clocks, specialized hardware, or custom networks. 

We define chaos tolerance as tolerance to anomalies common in clouds and the public internet: clock skews, packet loss, network partitions, vm migrations, disk failures, etc. Google Spanner delivers correctness guarantees equivalent to Calvin, but it uses synchronized atomic clocks and requires end-to-end control of the network, hardware, and software to carefully bound processing latency. 

However, unlike Calvin, Spanner uses two-phase commit which adds additional latency for writes. Other vendors that have tried to replicate Spanner have not succeeded at delivering anything close to the same level of performance and correctness without Google's custom operational environment or Calvin. 

As enterprises embrace portability and multi-cloud environments their data needs to be portable as well. Calvin gives us that architecture: distributed, strongly consistent, and reliable, no matter where you operate and how you choose to move your data. In FaunaDB, you can simply turn on nodes in different clouds and the data replicates itself. It just works."

daniele-levis-pelusi-276120-unsplash.jpg

"We expect everything to be partially failing all the time. And we spent a lot of time automating in highly resilient ways the traditional operational drudgery of database administration," says FaunaDB's Evan Weaver,

Photo by Daniele Levis Pelusi on Unsplash

To back this "just works" claim, FaunaDB has made Jepsen tests central in its go-to-market strategy. Jepsen is an effort to improve the safety of distributed systems, by maintaining an open source software library for testing, as well as posts, talks and reports exploring particular systems' failure modes.

Weaver said that FaunaDB's audience, especially early adopters in their market, are very familiar with Jepsen and increasingly treat it as a critical requirement of adopting new distributed systems. As you might expect for something that deeply technical, FaunaDB is reaching out to CTOs, Enterprise Architects, and Engineering Leads making database decisions for their distributed applications, or legacy apps being re-platformed for the Cloud.

Cloud, Kubernetes, and the verdict

The above may be a bit on the complex side for people beyond the CTO and engineering lead crowd. But you don't have to be in that crowd to appreciate less nuanced topics and capabilities such as managed cloud and support for Kubernetes, as those are more widely known to translate to something everyone appreciates: Efficient operations. And FaunaDB has clients such as Nvidia and Capital One to show for.

Weaver said FaunaDB was built for the public cloud, and support for Kubernetes will be there in a month, but customers run it on Kubernetes now with custom integration "just fine." The tradeoffs when going with databases offered by cloud vendors are clear. But why choose FaunaDB over a Spanner clone alternative? In the end, we asked Weaver point blank: 

"FaunaDB offers a higher level of transactional correctness, higher throughput, lower latency especially at global scale, temporality/security/multi-tenancy, and, despite the name, is more resilient than CockroachDB. Closing the integration gap with SQL leaves little reason to choose CockroachDB, Yugabyte, or even Google Spanner, since our serverless cloud is cheaper."

Another difference is that FaunaDB is not open source. Weaver noted that they like open source, but it has been faster to innovate and deliver an enterprise-class system with proprietary licensing right now. Thus far, he went on to add, they haven't seen many objections: 

"FaunaDB Cloud is free to try and there is a free trial download for enterprise available, so nobody is prevented from getting their feet wet or building out a initial use case. If you want a database that is resilient to chaos, keeps your data safe, and is simple to operate, you have to choose FaunaDB. Future directions will include additional APIs, analytics support, and several other exciting and novel capabilities."

It seems like a few pieces of the puzzle may not be 100-percent complete yet, and we have our doubts with respect to the multi-model aspect. Admittedly, however, FaunaDB's already interesting offering just became more interesting for more people with the addition of SQL, GraphQL, and CQL