Intercom - Buildkite Case Study


The Intercom team deploys about 150 times per day across multiple applications in the codebase. The continuous shipping of code requires the team to run a large number of tests and they needed a CI solution that could keep up with their demands.

Previously, Intercom was using two other CI tools — almost in redundancy — in order to achieve the stability they needed to build and to enable Intercom developers to do performance work on either of those environments. Unfortunately, even with dual solutions, they were unable to achieve the speed and control levels needed. “We’d still have outages and all sorts of problems which was frustrating,” says Scanlan. In the end, neither platform allowed the team to optimize for the things the team wanted. Plus, it was time-consuming and expensive to keep up both of them.

The team knew what they needed: reliability, control, and speed. “Getting the reliability of tests to run correctly was the first thing we needed to focus on. Just having full insight into what was going on would give us more visibility, and we’d be able to fix things quicker rather than working on a more closed platform,” Scanlan reflects back, “Then, optimizing for speed because the faster we can get feedback to the developers, the better.”

Solution

The final straw came when the Intercom team grew tired of seeing a continued failure rate of their JavaScript application — nearly 50% of the tests were failing.

“There was nothing wrong with the tests; it was the environment that was always failing,” Scanlan reports. This seemed like a good use case to try a new approach so, focusing solely on reliability, the Intercom team used Buildkite to orchestrate the build within their own EC2 infrastructure. “We didn’t even have an efficiency goal at that point. We just wanted a reliability improvement, so that was the small thing we were going to try out first,” reports Scanlan. Once they saw that reliability was no longer an issue, the team began optimizing for speed and moving all of their builds over to Buildkite.

Benefits

For a company whose “heartbeat is shipping”, time is priceless. While it used to take 20 to 25 minutes, the Intercom team can now run tens of thousands of tests in just three minutes. This provides their developers with real-time feedback which translates to a better developer experience and also gives them the ability to be more responsive when doing things like rolling back problems, getting changes out, or dealing with security problems.

“The mix of being able to own the infrastructure, performance-tune everything ourselves, having that full control, and then taking advantage of some Buildkite features (things like retries for failed build jobs and other kinds of automation) allows us to go really, really fast,” says Scanlan.