Elixir Concurrent Testing Architecture

When you make a new test file make sure to set it as async: true. Starting every test file this way is a great way to ensure your code is performant and consistent in a parallel environment as your codebase grows. Enforce this as soon as possible and start paying off some of this tech debt.

If you say “we’ll make them concurrent later” you will be in for a lot of compounding sadness in the form of tech debt in the future. I’ve seen this first hand in a very large Elixir repo. Testing was not prioritized and two years later the test suite took 30 minutes. Very few tests were async: true and many tests would contaminate the state of the testing application because side effects were not encapsulated. We had a litany of intermittent test failures purely depending on the order the tests were run.

Do not let this become your app.

Not only is it miserable for engineers but to try to make it bearable you’ll have to spend thousands to tens of thousands of dollars a month mitigating it by splitting your test suite into chunks and running those in parallel on lots of cloud compute. When your test suite takes 11 seconds to run engineers would rather run it locally than wait for CI to tell them the results. This speeds up delivery time and efficiency. The more quickly you can get feedback to your engineers the less time wasted in context switching and waiting.

Start disciplined, stay disciplined.

2. Use Mox or Hammox (preferred) to mock external or extraneous services.

Most applications will reach out to several different APIs or services. You may integrate with Firebase and Amazon SQS. It may be tempting to somehow integrate these services into your tests but that is something to be avoided. If either service’s testing environment goes down then your test suite will behave unexpectedly or fail outright. Many of these API calls will significantly slow your tests down as well. Instead, you should use Mox to mock the calls to these services. This way you can ensure the service is being called properly without slowdowns or unexpected responses.

I’ve merged a bare-bones Mox example in the Mox library here and here’s a good tutorial on how to setup and use Mox.

It might be tempting to reach for Mock as its a bit easier to setup and understand. However, the fine print in the Mock readme tells you explicitly:

Also, note that Mock has a global effect so if you are using Mocks in multiple tests set async: false so that only one test runs at a time.

This is the largest issue with Mock. You cannot Mock the same function across different parallel tests. This violates our first rule: “Every test file should be configured as async: true.” Having replaced Mock in several apps I can confidently say Mock is considerably slower than Mox as well. I was able to speed up test runtime by 40+ percent by simply replacing Mock with Mox but keeping the test file as async: false.

As an aside, you should try to use Hammox. It is a wrapper around Mox that will test and ensure the Elixir typespecs you define are enforced in the mocks you define. For example, you have a function like this:

@spec get_user(String.t()) :: %User{}

But then you mock it with:

UserMock
|> expect(:get_user, fn "12345" -> nil end)

Hammox will throw an error when you run your test because your spec says that function must return %User{} but you returned nil in your mock. Hammox is super helpful in maintaining typespecs in Elixir and ensuring you have good mocks that simulate what the function would return in a production context.

3. Use `start_supervised` to start unique GenServers or other required async processes in the `setup` block of that test.

When you start to have a larger Elixir application, your application.ex file will probably have several different applications running as children. Testing applications that need to talk to other children in the application pose a challenge for parallel testing. Any two tests calling these other child applications at the same time can return different results depending on test order. There are several ways to solve this issue. I have provided two different solutions.

The Elixir/default solution

This solution is provided in the branch async-true.

I’m lead to believe this is the more Elixir-y solution. I’ve talked to a handful of members of the community about this issue. Combine those chats with the default tutorials (like the one used for this article) and we can see the preferred method of doing this is passing the name or PID into the function like the argument server in the aforementioned tutorial:

@doc """ Looks up the bucket pid for `name` stored in `server`. Returns `{:ok, pid}` if the bucket exists, `:error` otherwise. """ def lookup(server, name) do GenServer.call(server, {:lookup, name}) end

So all we need to do is start our GenServer in our tests with a unique name, setup Ecto allowances, and then change our tests to call our unique GenServer.

Our test setup becomes this:

setup do registry = start_supervised!({KV.Registry, name: __MODULE__, test_pid: self()}) %{registry: registry} end

This starts a unique GenServer of KV.Registry with the name __MODULE__ which will be the module name of our test. This ensures the GenServer name does not collide with any other test that’s running this same GenServer. If we named it “Foo” and another test also named it “Foo” then the tests could have race conditions as they’d be accessing the same GenServer. The last line passes registry in a map to each test.

In our tests, it’s now as simple as passing our custom test KV.Registry GenServer PID in as the first argument:

This solution is relatively straightforward and simple to implement. The downside lies in complexity on the caller’s side. I’m not a massive fan of this solution because passing the GenServer you want to call seems like more work than we should have to do and more information than I, as the function caller, should have to know. It also ends up being rather verbose anytime you want to make a function call to this module. This situation is compounded if you have a GenServer that calls another GenServer. Do you pass both modules in everytime you make a call to this GenServer? What if you have a GenServer that calls two other GenServers? It can become cumbersome quickly with many different GenServers. With this solution, our tests run in the expected ~10 seconds in parallel.

The Manager layer solution

This solution is provided in the branch async-true-with-manager.

My preferred solution is the introduction of a layer between caller and GenServer. The rather uninspired name I have for it is “Manager”.
The Manager is responsible for determining which GenServer you want to call in a given module. Putting this layer here means we can dynamically swap which GenServer is being called in our tests without having to tediously pass it into every call we make to our module. This method also works much better than the previous method when you encounter a module that needs to use multiple mocks of other GenServers. Otherwise, you’d be forced to mock any calls to further GenServers or you’d have to pass a map of the all the GenServers you want to use and that gets nasty very quickly.

Where you would typically have:

def lookup(server, name) do GenServer.call(server, {:lookup, name}) end

You would instead have:

@registry_manager Application.compile_env(:mox, :registry_manager, KV.Registry.Manager)def lookup(name) do GenServer.call(@registry_manager.get_server(), {:lookup, name}) end

Full file here

The variable @registry_manager will either be what we set it to in our configuration under the keys :mox -> :registry_manager or it will default to KV.Registry.Manager which is our preferred default implementation for production and development environments.

This makes it so you can use Mox/Hammox to change which GenServer lookup/1 calls during tests like so:

Full file here

Hammox.stub will return our test registry PID registry every time get_server/0 is called.

This ensures that each test has a unique instance of KV.Registry which prevents state contamination and thus enables parallel testing. start_supervised will also ensure our spawned KV.Registry gets shutdown when our tests are finished.

You can see that this method is less verbose in our tests and still results in the ~10-second runtime of our test suite. We don’t have to know which GenServer to call because we’ve stubbed that function to always return the test GenServer.

4. GenServers need to be configurable on startup.

We need to design our GenServers so they can take a configuration on startup. This will allow us to specifically choose the name for the GenServer. This is important because named GenServers have to be unique. If we don’t configure this name then we will not be able to start a unique instance of the GenServer in each test.

You can see how I accomplish this here:

alias Ecto.Adapters.SQL.Sandbox
 ... def start_link(opts) do GenServer.start_link(__MODULE__, {:ok, Keyword.get(opts, :test_pid, nil)}, opts) end
 @impl true def init({:ok, parent_pid}) do if parent_pid != nil do :ok = Sandbox.allow(AsyncTesting.Repo, parent_pid, self()) end names = %{} refs = %{} {:ok, {names, refs}} end

This is also necessary to ensure we pass our Mox and Ecto allowances on to that new process. If you had some other Ecto or Mox allowances to configure, you’d put them in the same place as Sandbox.allow in the above example. Sandbox.allow tells Ecto that any calls this process makes to the database belong to the parent_pid‘s Ecto Sandbox. This prevents that nasty ownership error mentioned earlier.

Without this configuration, the new process will not know which Ecto sandbox to use and it won’t know which Mox/Hammox mocking instance to use. This will cause the test to fail with ownership errors because Ecto and Mox have no idea where to direct queries and mocked calls.

5. All primary keys used in tests should be unique via randomization.

This is subtle but very important. Ecto Sandbox is not magic, even though it feels that way. You can run into deadlocks and other issues in your tests if you use duplicate primary keys. The easiest way to solve this is to ensure any unique columns on your model are randomly generated in your tests.

Instead of this:

test “get user” do user = User.create("somebody@example.com") fetched_user = User.get(user.id) assert user.email == fetched_user.emailend

Prefer this:

test “get user” do email = "#{Ecto.UUID.generate()}@example.com" user = User.create(email) fetched_user = User.get(user.id) assert user.email == fetched_user.emailend

The issues this can cause if done improperly is rare but it is annoying to find and fix. Ecto discusses this in more detail here. If you ensure your tests use unique values like the example then you’ll never have to worry about any Ecto race conditions that can stem from non-unique data.

Elixir is a great language but it is very easy to poorly architect your tests. This will cause loads of headaches, pain, and tech debt later in the lifecycle of your application. If you follow the clever testing rules and embrace the idea of thoughtful testing architecture in your Elixir application then your future self and engineering peers will thank you.

Frequent testing, excellent test coverage, and considerate test architecture has been, in my experience, a keystone in engineering productivity, product stability, and engineer happiness.

In closing, if there’s anything you feel could be more clear or isn’t working how I described please let me know in a comment and I’ll make an edit to address the issue.