Shifting Gears

The way we set up our community meant that collectively we were able to work toward solving certain kinds of problems locally and organically, such as Android application development, new UX, more expressive pipeline description language, …​

But at the same time, the incremental, autonomous nature of our community made us demonstrably unable to solve certain kinds of problems. And after 10+ years, these unsolved problems are getting more pronounced, and they are taking a toll — segments of users correctly feel that the community doesn’t get them, because we have shown an inability to address some of their greatest difficulties in using Jenkins. And I know some of those problems, such as service instability, matter to all of us.

In a way, we are stuck in a local optimum, and that is a dangerous place to be when there is growing competition from all sides. So we must solve these problems to ensure our continued relevance and popularity in the space.

Solving those problems starts with correctly understanding them, so let’s look at those.

CI/CD service was once a novelty and a nice-to-have. Today, it is very much a mission critical service, in no small part because of us! Increasingly, people are running bigger and bigger workloads, loading up more and more plugins, and expect higher and higher availability.

Admins today are unable to meet that heightened expectation using Jenkins easily enough. A Jenkins instance, especially a large one, requires too much overhead just to keep it running. It’s not unheard of that somebody restarts Jenkins every day.

Admins expect errors to be contained and not impact the entire service. They expect Jenkins to defend itself better from issues such as pipeline execution problems, run-away processes, over resource consumption so that they don’t have to constantly babysit the service.

Every restart implies degraded service for the software delivery teams where they have to wait longer for their builds to start or complete.

Every Jenkins admin must have been burnt at least once in the past by making changes that have caused unintended side effects. By “changes,” I’m talking about installing/upgrading plugins, tweaking job settings, etc.

As a result, too many admins today aren’t confident that they can make changes safely. They fear that their changes might cause issues for their software delivery teams, that those teams will notice regressions before they do, and that they may not be able to back out somes changes easily. It feels like touching a Jenga tower for them, even when a change is small.

Upgrading Jenkins and plugins is an important sub case of this, where admins often do not have understanding of the impact. This decreases the willingness to upgrade, which in turn makes it difficult for the project to move forward more rapidly, and instead we get trapped with the long tail of compatibility burden.

I’ve often described Jenkins as a bucket full of LEGO blocks — you can build any car you want, but everyone first has to assemble their own car in order to drive one.

As CI/CD has gone mainstream, this is no longer OK. People want something that works out of the box, something that gets people to productivity within 5 clicks in 5 minutes. Too many choices are confusing users, and we are not helping them toward “the lit path.” Everyone feels uncertain if they are doing the right thing, contributors are spread thin, and the whole thing feels a bit like a Frankenstein.

This is yet another problem we can’t solve by “writing more plugins.”

This one is a little different from others that our users face, but nonetheless a very important one, because it impacts our ability to expand and sustain the developer community, and influences how fast we can solve challenges that our users face.

Some of these problems are not structural and rather just a matter of doing it (for example, Java 11 upgrade), but there are some problems here that are structural.

I think the following ones are the key ones:

  • As a contributor, a change that spans across multiple plugins is difficult. Tooling gets in the way, users might not always upgrade a group of changes together, reviewing changes is hard.

  • As a contributor, the tests that we have do not give me enough confidence to ship code. Not enough of them run automatically, coverage is shallow, and there just isn’t anything like production workload of real users/customers.

These core problems create other downstream problems, for example:

  • As a non-regular contributor, what I think of as a small and reasonable change takes forever and a 100 comments going back & forth to get in. I get discouraged from ever doing it again.

  • As a regular contributor, I feel people are throwing crap over the wall, and if they cause problems after a release, I’m on the hook to clean up that mess.

  • As a user, I get a half-baked change that wreaks havoc, which results in loss of their confidence to Jenkins, an even slower pace of change, etc. This is a vicious cycle as it makes us even more conservative, and slow down the development velocity.