Site Reliability Engineer - Ably realtime

What makes Ably special?

Ably is a serverless realtime infrastructure provider. See

Our high-performance global data stream network enables our customers to stream data anywhere, between any device, with mission-critical reliability so they can build next generation services and deploy, manage, and distribute streaming APIs.

Developers use our platform to build next generation digital services and share their realtime data at scale. We've gained the trust of some the largest businesses in the world to integrate us into their stacks, businesses such as HubSpot, OfferUp, Tennis Australia, Toyota and CA Technologies.  We also work with a diverse range of tech startups globally powering features such as an air traffic control system for drones. Working at Ably means you are working on a cutting-edge product that is helping global brands shape the future.

What we can offer you – in brief

You will learn with the best. You will have autonomy and freedom to experiment and improve. You will be part of a dynamic team and a business that is taking off.  We have the best technology, and the best people in the industry.  Join us now and you’ll be early in at a business going places, you’ll learn a lot, you’ll work with the founding team, and you’ll have fun.

What we want in return – in brief 

We want someone smart, ambitious, curious and motivated. Someone is prepared to do their best and work their arse off to do great work and become outstanding at what they do.

Job description

If you don't know what a Site Reliability Engineer is, we recommend you first read Google's definition of a Site Reliability Engineer, which we are in agreement with.

As an engineer in our Site Reliability Engineering team, you’ll build solutions to enhance availability, performance and stability of the Ably platform as well as developing new network services whilst automating away repetitive work. You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation. The team needs someone who can ask questions, learn from others and turn chaos into order.

This role would be a great fit for someone with creative and innovative problem solving skills with a willingness to take responsibility for the code you write all the way to production. You will develop and implement solutions that operate at scale - seeing your own technology efforts directly improve the reliability of our products. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers. 

If you're excited by working on truly complex problems at internet-scale with other smart engineers, you'll enjoy working at Ably.

Our infrastructure stack currently comprises of mostly:

  • Infrastructure languages: Ruby, Go, Bash.
  • Service languages: Go, Elixir, Node.js and some C.
  • Mostly AWS based, but we are experimenting with supporting other clouds.
  • Architecture: Exclusively Docker containers for all services, servers are immutable, ephemeral and disposed of frequently, datacenters (circa 20 and growing) are isolated and autonomous, critical shared services always have redundancy baked in; manual configuration of any infrastructure is a smell.
  • Data services: Cassandra (our realtime datastore, 3 regions, 6 data centers), Influx, Elastic, Kibana, Grafana, etc.
  • Web site: We use Rails & Heroku for simplicity. The web service is not part of our "core product" and thus has lower uptime requirements.

See and for a taster on the lengths we go to at each layer in the stack to ensure 100% service uptime.  

Day to day you can expect to be working on:

  • Writing Ruby code for our infrastructure automation, orchestration, configuration and continuous integration testing of our infrastructure.
  • Making extensive use of a wide range of AWS services. Whilst we primarily use AWS for our infrastructure, in time we expect that to change as we span other cloud services.
  • Managing and developing out our continuous integration services that test every aspect of the service, from infrastructure tools, to our health servers, routers, realtime services, protocol adaptors and client libraries. Our CI environment is mature, yet we would like to continue to evolve our CI environments to help improve the robustness of the platform and reduce risk of regressions.
  • If you're familiar with Go, or want to learn, writing Go code for our core routing and infrastructure services.
  • Being exposed to our other development environments such as Node.js and Elixir, both used extensively in our realtime services.
  • Working with the realtime engineering team to ensure our infrastructure supports the ever changing networking, security and processing requirements.
  • Collaborating with the team to design, discuss and implement new features and services.
  • Diagnosing and fixing bugs in all areas of our platform. You will often be working at very low levels in the network stack to help diagnose difficult to identify distributed problems.
  • Work with the engineering team to enable them to take responsibility for the complete lifecycle of the features and code they deliver i.e. pull request, reviews, testing, deploy to staging and sandbox environments, then into production environments. We are strong believers in all developers being responsible for deploying their own code.
  • Contributing to open source projects that we support or use in our products. All of our client libraries are open source as well and may require your support at times.
  • Helping customers solve problems they are experiencing that may help us find bugs in the platform.
  • Support the wider team in regards to documentation and customer support.
  • Suggestions for new features or improvements to our protocol and API specifications


  • Salary range: £50k to £85k.
  • Employee options: Yes, negotiable.
  • Holidays: 25+ days excluding national holidays.
  • This role can be remote or on-site in our London office. However, if you are working remotely, you will need to be in the UK and be commutable to visit our office in London as necessary. You will benefit from a flexible working environment in which working from home and managing your own working hours sensibly is the norm. 
  • Work in an environment where code quality, technical challenges and delivery are what we all care about. 
  • Skills development is intrinsic in the job. We're largely working on unsolved problems each day, as such, there is plenty of scope to widen your knowledge and skillset.
  • Work with genuinely nice and smart people who care about code quality and enjoying their jobs.


  • Experience: A minimum of a three years of professional experience with Ruby which is used in all our orchestration and server management layers. Experience or a deep interest in Go would be advantageous as our routing and network services are built with Go. You should have experience using both statically and dynamically typed languages. Experience with Node.js and Elixir/Erlang would be nice, but not necessary. You must have solid experience managing infrastructure and CI environments, and any distributed or large scale infrastructure management is preferred. Understanding of distributed systems is beneficial.
  • Pragmatic: A problem solver excited by the prospect of automating your job away and working autonomously to solve problems and bring solutions to the team.
  • Fast Learner: We’re looking for software engineers who thrive on applying their knowledge, learning new technologies.  Our stack is diverse, and we expect it to continue to grow.
  • Testing: Experience using testing frameworks and adoption of test driven development where applicable.
  • Communication: We use tools such as Slack throughout the day to communicate; however, we believe in voice conversations to discuss and solve problems. You must be proficient in spoken and written English, be eager to collaborate with the engineering team and constructively welcome code reviews.
  • Customers: Comfortable talking to customers and assisting them with their technical issues and integration.
  • Open source: We prefer developers who have contributed back to the open source community, even if those contributions are small.