rqlite 6.0: the evolution of a distributed database design


rqlite is a lightweight, open-source, distributed relational database written in Go, which uses SQLite as its storage engine. v6.0.0 is out now and makes clustering more robust. It also lays the foundation for more important features.

What’s new in 6.0?

6.0 multiplexes multiple logical connections between rqlite nodes using the existing Raft inter-node TCP connection. This multiplexing existed in earlier versions, was removed, but has been re-introduced in a more sophisticated, robust manner.

This new logical connection allows Followers to find the Leader in a more robust and stateless manner, when serving queries that require the client to contact the Leader. It also uses Protocol Buffers over this logical connection, leading to more robust, extensible code.

The evolution of a design

Inter-node communication in rqlite has evolved over the years, and it’s interesting to see the 3 design patterns in context. What do I mean by inter-node communication in this case? I mean the transmission of information specific to rqlite itself between cluster nodes, but not the Raft-specific communication between the Hashicorp code.

Most importantly, each rqlite node must know the HTTP URL of the Leader, so it can inform client code where to redirect queries. And since the Leader can change, each rqlite node must have up-to-date information regarding the Leader HTTP URL at all times. This information is not something Hashicorp Raft handles intrinsically, so it’s up to the rqlite code to deal with it.

How this is done has evolved over time.

Pattern 1: Clients must search for the Leader

Pictured above is a 3-node rqlite cluster. With rqlite 2.x if clients contacted a Follower, the Follower returned information the client could use to find the Leader, but the client might have to contact every other node in the process.

Before version 3.x, rqlite nodes returned the Raft network address of the Leader to the client, if the requested operation could only be served by the Leader (assuming the node itself wasn’t the Leader). This forced the client to check all other cluster nodes, to see which one had the relevant Raft address, and then redirect the query itself to that node’s HTTP API URL.

While this meant the rqlite code was much simpler than later versions, this approach only worked if the client knew about every node in the rqlite cluster, which was not very practical if nodes in the cluster changed or failed. There was no surefire way the client code could be informed of those changes, so clients could effectively lose touch with clusters.

Pattern 2: Nodes communicate HTTP API URLs via Raft consensus

In 3.x, 4.x, and 5.x rqlite nodes used Raft consensus to map Raft network addresses to HTTP API URLs, for every node. If a client contacted a Follower, the Follower knew (via Raft consensus) the HTTP URL of the Leader, and returned that to the client. It generally worked, but the implementation didn’t prove to be robust in the long term.

By version 3.0 it was clear that rqlite Follower nodes needed to return proper HTTP redirects. But the question then became how would Follower nodes know the HTTP URL of the Leader, at all times?

Versions 3.x, 4.x, and 5.x used the Raft consensus system itself to broadcast this information. When a node joined the cluster it would commit a change to the Raft log, updating the configuration that mapped its Raft network addresses to its HTTP API  URLs. Thanks to the consensus system every node always knows every other node’s Raft network address so, in theory, each node knew every other node’s HTTP API URL via this mapping. And therefore every node knew the Leader’s HTTP API URL, even across Leader changes.

This design worked well for a long time, but had some shortcomings. One possible problem was what to do if a node joined the cluster, but then failed to update its HTTP API URL mapping (for whatever reason). It could happen, but was very unlikely. But if it did, the cluster could need significant manual intervention before it could recover.

Over time other bugs were discovered which showed this approach wasn’t robust when cluster membership changed regularly. It also meant extra state, in addition to the SQLite database, was being stored via Raft. It worked well enough for years, for versions 3.x, 4.x, and 5.x, but by version 6.x a better way had been identified. So instead of fixing the bugs, the design was reworked to render the bugs moot.

Pattern 3: Followers query the Leader on demand

In 6.x if a client contacts a Follower, the Follower will query the Leader on demand for the Leader’s HTTP URL, return that to the client, and then the client will contact the Leader directly. Raft consensus is not involved.

In version 6.0, a Follower queries the Leader on demand, when it needs to know the Leader’s HTTP API. And critically it it does this over the Raft TCP connection to the Leader, a connection that must always be present anyway — without that connection the node is not part of the cluster, and the cluster has bigger issues. So this new design doesn’t introduce any new failure modes.

This type of design is also stateless, and even if reaching out to the Leader to learn its HTTP API URL fails, that failure happens at query time — meaning the client can be informed and the client can decide how to handle it. Most likely the client can simply retry the request, if the error is transient.

You can find further details on the design on 6.0 in the CHANGELOG.

What’s coming in future releases?

This new design and implementation makes it much cleaner to implement the following, upcoming features.

Transparent request forwarding

Today if an rqlite Follower receives a request that must be served by the Leader, it will return HTTP 301. Now, with a more sophisticated inter-node communication mechanism, future releases will allow Follower nodes to query the Leader directly on the behalf of the client and return the results directly to the client. This will make using the CLI, and client library coding, simpler, and make the cluster much easier to work with — allowing rqlite to serve queries much more transparently.

Better Kubernetes support

During 6.0 development, I got some great assistance from the team at Sosivio. They found some bugs in the 5.x series, and provided best practise advice on how rqlite can be changed to work better on Kubernetes. Those changes may feature in future releases.

Next steps

Download version 6.0, and try out the more robust clustering — and look out for new features in future releases.