Introduction to RedisGears

By July 16, 2019

At RedisConf19, we announced the release of a new module called RedisGears. You may have already seen some other modules by either Redis Labs or the community at large, but Gears will defy any expectations you have. It really pushes the limits of what is possible with modules. The only caveat is that it’s still in Preview so, while you can already try it out, you will have to wait a bit more for it to get to General Availability and become officially supported.

Gears scripts

At first glance, RedisGears looks like a general-purpose scripting language that can be used to query your data in Redis. Imagine having a few hashmaps in your Redis database with user-related information such as age and first/last name.

Here is the execution breakdown for the RedisGears script:

  1. It is run on all keys that match the user:* pattern.
  2. The script then filters out all keys that have the age hash field lower than (or equal to) 35.
  3. It then runs all remaining keys through a function that calls DEL on them (i.e., the keys are deleted).
  4. Finally, it returns both key names and key values to the client.

This simple example showcases how you can use a Gears script similar to how you would use the query language for any other database. But, in fact, RedisGears scripts can do much more because they are Python functions running in a full-fledged Python interpreter inside Redis, with virtually no limitations. Let me show you why that matters:

In this example, I’ve installed numpy in my server using pip so I can use it inside my scripts. This means that all the Python libraries you love, and even your own code, can now be used to process data inside Redis. How neat is that?

In this gist, you can read how to install Python packages in our RedisGears Docker container.

Gears executes full-fledged Python scripts

You might have noticed by now that one-liners inside redis-cli are not a super clear way to write RedisGears scripts. Thankfully, the RG.PYEXECUTE command is not limited to those. You can also feed it full-fledged Python source files. This also means that the script can contain normal Python functions, so you’re not forced to use lambdas if you don’t want to. Let me show a couple of ways to load a Python script. Here’s a more readable version of the previous example:

With redis-cli

Using Python (or any other language)

Gears is cluster-aware

RedisGears can also understand your cluster’s topology and propagate commands accordingly. We already made implicit use of that feature in our previous examples, since the scripts would behave as intended when run in a cluster (i.e., each shard would do its part of the job and finally aggregate all the partial results if necessary).

You’ll occasionally need more fine-grained control over how your computation is executed, especially for multi-stage pipelines where you have an intermediate aggregation/re-shuffle step. For this purpose, you have at your disposal collect and repartition. These will, respectively, go from a distributed sequence of values to a materialized list inside a single node and, inversely, back to a distributed stream sharded according to a strategy of your choice.

You can also launch a job that doesn’t require the client to stay connected, and wait for a result. When you add the optional UNBLOCKING argument to RG.PYEXECUTE, you’ll immediately get a token that can be used to check the state of the computation and eventually retrieve the final result. That said, know that RedisGears scripts are not limited to one-off executions when invoked from a client.

Gears can react to streams and keyspace events

Have you ever had the need to launch operations inside Redis in response to a keyspace event, or to quickly process new entries in a stream for a situation where spinning up client consumers seems wasteful?

RedisGears enables reactive programming at the database level. It’s like using lambda functions, but with a dramatically lower latency, and with much less encoding/decoding overhead. 

Here’s a script that records all commands run on keys that have an audited- prefix:

This second script then reads the audit-logs stream and updates access counts in a sorted set called audit-counts:

If you register both queries, you will see that both the stream and counts update in real time. This is a very simple example to show what can be done (clearly not a great audit logging system). If you want a more concrete example, take a look at some recipes.

Gears is asynchronous

Don’t be afraid to launch demanding jobs. RedisGears scripts run in a separate thread, so they don’t block your Redis instance. This also means that Gears queries can’t be embedded inside a Redis transaction. If you have a cluster constantly under memory pressure or running transactional workloads, Lua scripts will be your best choice to add custom transactional logic to your operations. For everything else, there’s Gears.

Next steps

The quickest way to try out RedisGears is by launching a container. Keep in mind that our modules also work with open source Redis.

We have a Docker container on DockerHub that contains all the Redis Labs modules:

docker run -p 6379:6379 redislabs/redismod:latest

We also have a version that contains RedisGears only:

docker run -p 6379:6379 redislabs/redisgears:latest

Documentation can be found at redisgears.io and the code is available on GitHub.


Page 2

Here are five things to keep in mind when writing a Redis module. While this list is non-exhaustive, my aim is to offer a good way to get started if you don’t yet have much experience with module building.

1. Find a compelling module use case

Redis already has plenty of tools that allow you to build the exact solution you need. One example could be locks. Using SET with the NX option, you can create a lock key, and by combining it with EXPIRE, you get a lock lease. This can be very useful when solving coordination problems. When built-in commands are not enough, you might also resort to Lua scripts, which add full programmability to composite operations that are then executed atomically by Redis.

Modules go a step further, giving you even more flexibility and speed, thanks to their ability to access lower-level APIs compared to Lua, but they’re more challenging to maintain and distribute. Go for a module only when Lua can’t fully solve your use case.

Modules can add new commands

Modules can add new commands to Redis that execute arbitrary C functions (to be precise, you can also use Rust, Zig or any C-ABI compatible language). What you do in your function is up to you. A basic, but useful, starting point could be implementing a command that is similar to an existing one but does something more. An example of this could be SETNE (which was first mentioned by a user in this GitHub Pull Request). SETNE behaves exactly like SET, but when the new value is equal to the current one, it does not modify the key, thus avoiding producing a spurious keyspace notification. In general, to get some practice, think about small additions you could make to existing commands to help with specific use cases. 

Most of those small additions would be best implemented as Lua scripts, but it’s a good way to gain some experience in case you can’t come up with compelling module ideas right from the start. A couple exercises left to the reader: SETEQ, HINCRDATEBY.

Modules can add new data types

The most effective way a module can add functionality to Redis is by adding a new data type. Redis has a strong focus on proper design of data structures and their related algorithms and properties. While you might not know what the exact implementation of the Set data type is, you know for sure that set membership (SISMEMBER) is always going to be fast regardless of Set size (i.e., it has sub-linear asymptotic complexity), for example.

This is the basis behind our own modules:

  • RediSearch is a full-text search module based on inverted indices.
  • RedisGraph is graph module based on sparse matrices.
  • RedisTimeSeries is similar to Redis Streams but optimized for numerical series.
  • RedisBloom offers a few different probabilistic data structures.
  • RedisAI runs Tensorflow deep-learning graphs (and a few other types).

These are serious modules, but not every module that introduces a new data type has to be this complex. There are plenty of simpler data types that could be useful as a module. A basic example could be a different implementation of an already present data type in Redis, like using an ArrayList to implement Lists, for example.

2. Polish your API

Don’t forget that wrong usage of your module’s commands is going to be as important to prepare for as correct usage. Redis users like to try commands by hand to get a better understanding, and typing in wrong arguments is part of that process. Your API should be easy to use and hard to misuse, but when the inevitable happens, make sure to report meaningful error messages.

Take a look at how standard commands behave within Redis and see if you can come up with something that works on the same assumptions. This will lessen the mental overhead required to use your commands. One example is that, in Redis, most commands have sensible behavior when called on a non-existent key: INCR will assume a missing key has value 0 so it will set it to 1, SADD will assume a missing key is an empty set, and so forth.

3. Be a good citizen

Modules can interact with the Redis ecosystem. Make sure to read the documentation to learn how to get the details right, especially if your module implements a new data type. Here are the two most important aspects to get right.

Command flags

When you’re declaring a new command, you must specify a few flags to tell Redis what your command is going to do when invoked. Is it going to just read data or also write it? Is it going to allocate memory or just modify existing data? Make sure to fill those options correctly. For example, in out-of-memory (OOM) situations, deny-oom is an important flag that will tell Redis to deny access to a command that allocates memory, otherwise the whole process will be killed by the OOM killer! Even the read-only flag is important. New client-side caching functionality will use it to decide whether to enable tracking for a given key or not.

Command replication

When Redis is run in a master/replica setup, the master must know which commands it should send to replicas or not. Not every command should be replicated, and some might need to be replicated only under specific conditions. For instance, I mentioned above the SETNE command that would set a key value only if the new value is different from the current one (otherwise it does nothing). In this case, the command should be replicated only when it is effectively applying a change to the key. There is no reason to make each replica execute it if it would not perform any write. Redis can’t know what to do from the outside, so you must make proper use of RedisModule_ReplicateVerbatim and related functions. 

4. Write great documentation

It doesn’t matter how useful your module is if no one understands how to use it. Polishing your API can help immensely in that regard, but first you need to convince potential users that the module is at least worth trying out. A good module should have good documentation that explains the general goal of the module and lists detailed information for each command.

If you take a look at redis.io, you will see that each command lists its relative BigO complexity and has a few extra notes for when a command has particularly big or small constants, or when there are notable edge cases. Try to replicate that format, especially with regards to the syntax for command examples. Notice how each example uses lowercase names for placeholders, while uppercase ones denote keywords that must be used verbatim, with optional values between square brackets. Look at the documentation of SET to see an example of this.

Most importantly: Strive for simplicity

Always keep in mind that the first design principle behind Redis is simplicity. This doesn’t mean your module should never explore other options and occasionally sacrifice simplicity for other benefits (modules exist precisely to let Redis users experiment), but always be mindful of what you’re giving up.

Generally speaking, when you sacrifice simplicity for ease of use you’re also implicitly constraining the ways in which your users will be able to use your module. In Redis, most utility generally doesn’t come from a given command used in isolation, but rather in how users can combine different commands together. Smaller, clearer, simpler commands will always be easier to combine and thus yield greater results in the grand scheme of things. For this reason, I recommend increasing ease of use by properly applying the techniques described above before resorting to this kind of trade-off.

Another potential trade-off could be in favor of efficiency. This may be worth exploring and is one that Redis occasionally makes itself. A few built-in data types have two internal representations — one optimized for when the data type only has a few elements in it, while the second one is for when the key grows over a certain threshold. Two representations (plus the mechanism to switch between the two) are certainly more complex than just one, but the benefits might be worth it. This is especially true since the added complexity doesn’t show up in the user interface, as users will interact with the data type in the same way regardless of which internal representation is in use.

In conclusion

Take a look at which modules already exist, and see if you can find inspiration. We published an SDK for writing modules in Rust and also wrote about doing it in Zig, so don’t worry if you don’t (want to) know C. We also have talks on YouTube (Rust, Zig), if you prefer listening over reading.

If you do end up writing a module, please make sure to send a pull request to antirez/redis-doc to have it added on redis.io and, if you feel like it, shoot me a tweet @croloris. I’ll be happy to try out your module.