There are a ton of resources out there that talks about setting up Rails, Sidekiq and Puma on Heroku/Linux, but somehow during my own personal endeavour into the process, I felt that there were always little bits of information that was left out. So I decided to write my own version and put together all the little things which I've found answers to.


Introduction

We will first briefly talk about the technologies used here. After that, we will talk about setting up Redis, configuring Sidekiq and linking them it all with Puma. Finally, we will talk about getting it all to work on Heroku and Linux.

What?

Sidekiq Sidekiq is a neat gem that allows us to easily implement efficient background processing for Ruby applications. It works by spawning multiple worker threads from a single master worker process.

In Rails, its mostly used as a means to offload long running tasks to the background. This can include heavy tasks such as uploading photos to S3, or doing additional post-processing to PDF files. It can also be scheduled to perform tasks at an arbitrary time in the future. Using Sidekiq in these manner allows for your Rails application to appear more responsive because the user's request will complete faster (while the heavy lifting is done in the background and users doesn't need to wait for it to finish).

On a high level, Sidekiq consists of 2 "parts" that interface with a Redis instance (Redis is a lightweight, persistent, key-value pair based database). The 2 parts are Sidekiq Client and Sidekiq Server. A Sidekiq client is anything that "pushes" jobs to a queue stored in the Redis instance. A Sidekiq server on the other hand is the process that "pulls" jobs from the queue and executes it. Sometimes, a Sidekiq server can also behave as a Sidekiq client. This happens when the Sidekiq server itself also pushes jobs into the queue (eg: a worker that queues another worker.).

Puma
Puma is a concurrent web server for Ruby.

Puma increases the responsiveness of our Rails application by either spawning multiple threads to handle requests, or forking multiple process instances of our app (cluster mode) or by using a combination of both.

Puma replaced Unicorn as Heroku's recommended web server for Rails because of its ability to gracefully handle slow requests better. You can read more about it here.


Let's Start!

Setting up Redis
We begin by first making sure that Redis is installed. This step is easily missed, when I was setting up Sidekiq for the first time, I was completely caught off-guard about needing to install redis.

Localhost - Mac
Install Redis by typing brew install redis into your console.
Tada! Walla! Redis is now installed! That was easy wasn't it?

Localhost/Remote Server - Linux
Installing redis on Linux can be a little bit lengthy (if you choose to do additional configurations and customisations).

There are two ways to install Redis on Linux:
Method 1: Redis official documentation discourages the use of native Linux package managers because it may be outdated. Instead, you can use the following commands taken from the official redis site.

wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
sudo make install

Method 2 (My recommendation): Alternatively, to circumvent the problem of outdated redis packages on the native Linux package managers, we can just manually point the package manager to reference an updated PPA. After that, it's installation is a breeze.

$ sudo add-apt-repository ppa:chris-lea/redis-server
$ sudo apt-get update
$ sudo apt-get install redis-server

Once you've installed redis, if you are using a Linux server for production, you will want to make sure that redis server starts automatically when the system boots up. To do this, you will need to make sure a proper init script is exists in the /etc/init.d path (If I am not mistaken, Method 2 does this automatically for you. I have not tried Method 1, so I am not too sure).
For an awesome comprehensive explanation on how to do that, you can visit here. I encourage you to test out starting, accessing and saving redis once the setup process is complete. I've personally ran into folder permission issues the first time I tried setting up redis on EC2.

Localhost - Windows
I'm sorry my Windows friends, but apparently Redis doesn't play nice with Windows right out of the box. See here for potential solutions.

Heroku
Setting up Redis on Heroku is extremely easy. After you created your app, you just need to provision a Redis database add-on. There are several providers to choose from. RedisToGo and RedisCloud are the popular ones with free tiers.

After provisioning a Redis add-on, the redis instance will be exposed to your Heroku application via the an environment variable. Usually it will default to ENV['REDIS_URL'], but sometimes it can be some other variable like ENV['REDISCLOUD_URL'].

Setting up Sidekiq
Now that redis is installed on our system, let's get Sidekiq into our Rails application. Add the following line into your Gemfile and then run bundle install.

 gem 'sidekiq'

Once that's done, Sidekiq is ready to go. However, most of the time, it is advisable to include a config file for Sidekiq. The config file allows us to configure things such as the number of workers to spawn, the types of queues to run and the location to store pid and log files. For more information about config files and queues, you can checkout the Sidekiq wiki.

In our case, we will be using the configuration below, defined in the file /config/sidekiq.yml (create one if it doesn't exists):

development: :concurrency: 5
production: :concurrency: 20
:queues: - default 

A concurrency of 20 means that Sidekiq will spawn up to 20 workers to process jobs in the queue. These workers are only spawned when there are jobs to run.

Setting up Puma
Ok, time to install the big cat! First off, add Puma to your Gemfile and run bundle install.

gem 'puma'

Puma integrates well with Rails. Just by running the rails s command now, a Puma server instead of webrick will start. For development, this behaviour is fine, but in production, every rails app and server resources are different, so it is advisable to create a config file for Puma as well.

Create a file /config/puma.rb and fill in the following:

workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['MAX_THREADS'] || 1)
threads threads_count, threads_count preload_app! rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development' # Because we are using preload_app, an instance of our app is created by master process (calling our initializers) and then memory space
# is forked. So we should close DB connection in the master process to avoid connection leaks.
# https://github.com/puma/puma/issues/303
# http://stackoverflow.com/questions/17903689/puma-cluster-configuration-on-heroku
# http://www.rubydoc.info/gems/puma/2.14.0/Puma%2FDSL%3Abefore_fork
# Dont have to worry about Sidekiq's connection to Redis because connections are only created when needed. As long as we are not
# queuing workers when rails is booting, there will be no redis connections to disconnect, so it should be fine.
before_fork do puts "Puma master process about to fork. Closing existing Active record connections." ActiveRecord::Base.connection.disconnect!
end on_worker_boot do # Worker specific setup for Rails 4.1+ # See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot ActiveRecord::Base.establish_connection
end

Let's take a closer look at what we've filled in:

  • workers basically specifies the number of worker process Puma will fork out from the master process. Each worker process runs a copy of your Rails app. Setting this number more than 1 will enable cluster mode.
  • threads basically specifies the number of threads each worker process can have. A threads 0,16 just means that minimum there can be 0 threads and maximum there can be 16 threads running. Threads are only spawned when your app needs them (So you can save server resource when there are no requests). In this code above, I set the minimum and maximum number of threads to be the same, as defined by the variable thread_count (This config is optimized for Heroku as charges are based on number and size of dyno instead of how much resources is used.).
  • Each Puma thread can handle 1 request at a time. Therefore, 16 threads can handle 16 request at a time. If you have 1 worker with up to 16 threads, that means you can only handle 16 requests at a time. If you have 2 workers, that means you have a total of 32 threads, thus your rails app can handle 32 requests at a time.
  • So why does Puma have workers and threads? Why not just have 1 worker and 32 threads? To answer this, there are several points to understand.
    • workers, in general, are more memory dependant. Each Puma worker is basically an entirely new OS-level process with its own memory address space containing a copy of your Rails app. 2 workers means 2 copies of your Rails application code in memory.
    • threads, in general, are more CPU reliant. Multiple threads run in a single worker process. The processing of the threads gets interleaved at the OS and CPU level. Additionally, all threads share the same memory address space.
    • Using threads require your application to be thread-safe. This is because threads run in the same process and share memory address. Simple things like manipulation of class variables by one thread can easily mess up the results in another thread.
    • Using workers on the other hand does not require your application to be thread-safe (if you only have 1 thread per worker.). Memory address is isolated between processes, so variable changes in one worker will not affect the other. However, using workers can quickly bloat up your memory footprint. If your rails app usually takes 100mb of memory, then running 2 workers will require 200mb of memory. However, most rails applications are small enough to not have to worry about this.
    • As a result, one must find the right balance between the number of workers and threads. If your code is not thread-safe, then using multiple workers with a 1 thread each is the way to go. However, if you can ensure the thread-safety of your application, then using threads alongside workers will allow you to maximise both the CPU and RAM available to you.
  • preload_app! basically specifies that your Rails app should be loaded by Puma master process first, and then when workers are spawned, they will just fork the memory address for your rails app from the master process. The purpose of this is to speed up the worker creation process. Without preload_app!, each worker will have to reload your app again.
  • The before_fork and on_worker_boot blocks are only required if you are using preload_app!. When the master process loads your Rails application, all the initialization will be done. This includes connection to the DB and also Redis. The code in before_fork basically closes any active connections to the DB to make sure no connections are leaked. After that, the workers will fork and boot up. That is when on_worker_boot is called and we re-establish ActiveRecord connection to the DB for each worker.

Configuring Sidekiq and Connecting to Redis
Under default circumstances, for localhost development and on Heroku, Sidekiq will be able to connect to redis without any configurations.

However, in the event that is not the case for you, such as because you changed the redis port, or ENV['REDIS_URL'] was not populated correctly, then you will have to configure Sidekiq to connect to redis in an initializer file.

First, create a file /config/initializers/sidekiq.rb (actually, you can name it anything you want, or even put it inside another file in the initializers folder.), then fill in the following:

if Rails.env.production? Sidekiq.configure_client do |config| config.redis = { url: ENV['REDIS_URL'], size: 2 } end Sidekiq.configure_server do |config| config.redis = { url: ENV['REDIS_URL'], size: 20 } Rails.application.config.after_initialize do Rails.logger.info("DB Connection Pool size for Sidekiq Server before disconnect is: #{ActiveRecord::Base.connection.pool.instance_variable_get('@size')}") ActiveRecord::Base.connection_pool.disconnect! ActiveSupport.on_load(:active_record) do config = Rails.application.config.database_configuration[Rails.env] config['reaping_frequency'] = ENV['DATABASE_REAP_FREQ'] || 10 # seconds # config['pool'] = ENV['WORKER_DB_POOL_SIZE'] || Sidekiq.options[:concurrency] config['pool'] = 16 ActiveRecord::Base.establish_connection(config) Rails.logger.info("DB Connection Pool size for Sidekiq Server is now: #{ActiveRecord::Base.connection.pool.instance_variable_get('@size')}") end end end end

That looks confusing, but it's not. Let me explain.

The Sidekiq.configure_client block basically tells all Sidekiq Clients which redis instance to push their jobs to.

The Sidekiq.configure_server block on the other hand, yes, you guessed it, tells all Sidekiq Server which redis instance to pull their jobs from.

The redis instance location is specified using the url key.

Note: A Redis instance url typically has the following format (< and > not included):
redis://<user>:<secret>@<url path>:<port>/<db number>.

In your Linux production server, you can set the ENV['REDIS_URL'] to contain the url of your redis provider. If you are running redis on the same server with default configurations (locally, with respect to the Rails app), then you can use the default url redis://localhost:6379. On Heroku, if ENV['REDIS_URL'] is already set, then everything will work fine. If it is not set, then you will have to set it manually after getting the url of your redis instance from your redis provider. For example, if you are using REDISCLOUD, you can easily set ENV['REDIS_URL'] by typing in the console:

heroku config:set REDIS_URL = REDISCLOUD_URL

Note: You might have to revise url: ENV['REDIS_URL'] to url: ENV[ENV['REDIS_URL']] for the REDISCLOUD case.

The size key on the other hand merely specifies the number of connection per Sidekiq client/server has to redis. In this case, for the Sidekiq client, every web dyno on Heroku, or every Puma worker running your app will have up to 2 connections to redis. On the other hand, for the Sidekiq server, each worker dyno can have up to 20 connections to redis. If you are deploying to Heroku, especially if you're using the Free tiers, you will want to take note of the values you set here.

How do I know what size value to use? You count. The first thing you have to understand is that it is a connection pool and each process has its own connection pool. A single webrick Rails app is 1 process. A rails app launched with Puma will have a connection pool for each Puma worker. As a rule of thumb, the Sidekiq client, which is usually a rails app will only need 1 connection to redis. I used 2 in the config above to handle rare cases where one of the connection hangs and at least the client has a backup to use.

For Sidekiq server, setting a size value equivalent or less-but-close-to your Sidekiq concurrency will suffice.