A Pipeline Friendly Layered Testing Strategy & Recipe for DEV and QA

By Roy Osherove ·

The key point of running a pipeline is to get feedback which in turn is supposed to provide us with Confidence. That’s because a pipeline is really just a big test - it’s either green or red; and it’s composed of multiple small tests that are either green or red.

Types of feedback

We can divide the types of tests in a pipeline into two main groups:

Break/fail feedback

  • Provides a go-no-go for an increment of the code to be releasable and deployed

  • Great for unit tests, e2e, system tests, security tests and other binary forms of feedback

Continuous Monitoring Feedback

  • An ongoing KPIs pipeline for non binary feedback

  • Great for code analysis and complexity scanning, high load performance testing, long running non functional tests that provide non binary feedback (“good to know”)

To get faster feedback I usually have two pipelines, one for each type of feedback I care about:

  • Delivery pipeline: Pass/fail pipeline

  • Discovery Pipeline: Continuous KPI pipeline in parallel.

The point of the Delivery feedback is to be used for continuous delivery – a go/no go that also deploys our code if all seems green. It ends with a deployed code, hopefully in production.

The point of a Learning feedback is to provide learning missions for the team (Our code complexity is too high.. let’s learn how to deal with it), as well as show if those learnings are effective over time. It does not deploy anything except for the purpose of running specialized tests or analyzing code and its various KPIs. It ends with numbers on a dashboard.

In this article I will focus on the confidence we gain from the delivery pipeline, and how we can gain even more speed out of it but still maintain a high level of confidence in the tests that it runs

it’s important to note that speed is a big motivator for writing this article in the first place, and splitting into “discovery” and “delivery” pipelines is yet another technique to keep in your arsenal, regardless of what I write further down in this article.

But let’s get back to the idea of “confidence”.

What does confidence look like?

In the delivery realm, what type of confidence are we looking for?

  • Confidence that we didn’t break our code

  • Confidence that our tests are good

  • Confidence that the product does the right thing

Without Confidence…

If we don’t have confidence in the tests this can manifest in two main ways:

  • If the test is red and we don’t have confidence in the test- you might hear the words “Don’t worry about it” - we go on as if everything is OK and assume the test is wrong. (potentially not paying attention to real issues in the code)

  • If the test is green and we don’t have confidence in the test - we still debug or do manual testing of the same scenario (wasting any time the test should have saved us). We might even be afraid to merge to the master branches, deploy or do other things that affect others.

With Confidence…

This confidence allows us to do wonderful things:

  • Deploy early and often

  • Add, change and fix code early and often.

So what’s the problem?

Let’s look at common types of tests that we can write and run in a delivery pipeline:

There are quite a few types to choose from. That means, there are usually several questions teams needs to decide when they come to the realization that they want to have lots of delivery confidence through automated tests:

Which type of tests should we focus on?

  1. How do we make sure the tests don’t take hours and hours to run so we can get faster feedback?

  2. How can we avoid test duplication between various kinds of tests on the same functionality?

  3. We are still (in some cases) working in a QA+Dev fashion. How can we increase collaboration and communication between the groups and create more knowledge sharing?

  4. How can we inch towards DEV taking on automated test ownership as part of our transformation?

  5. How can we be confident we have the right tests for the feature/user story?

To start making an informed decision, let’s consider several key things for each test type:

  1. How much confidence does the test provide? (higher up usually means more confidence – in fact, nothing beats an app running in production for knowing everything is OK)

  2. How long is the test run time? (feedback loop time)(higher up means slower)

  3. How much ROI to I get for each new test? (the first test of each type will usually provide the highest ROI. The second one usually has to repeat parts of the first so ROI is diminished)

  4. How easy is it to write a new test? (higher up is usually more difficult)

  5. How easy is it to maintain a test? (higher up is usually more difficult)

  6. How easy is it to pinpoint where the problem is when the test fails? (lower down is easier)

Where teams shoot themselves in the foot

Many teams will try to avoid (rightfully so) repeating the same test twice, and since End to End (aka “e2e”) tests usually provide lots of confidence, the teams will focus mostly on that layer of testing for confidence.

So the diagram might look like this for a specific feature/user story:

It might also be the case that a separate team is working on the end to end tests while developers are also writing unit tests (yes, let’s be realistic – the real world is always shades of grey – and enterprises are slow moving giants. We have to be able to deal with current working practices during transforation), but there might be plenty of duplication or missing pieces between the two types of tests:

In both of these cases (and other variations in between) we end up with a growing issue: These e2e tests will take a long time to run as soon as we have more than a dozen.

Each test can easily take anywhere from 30 seconds to a couple of minutes. It only takes 20-30 e2e tests to make the build run for an hour.

That’s not a fast feedback loop. That’s a candidate for a “let’s just run that stuff at night and pray we see green in the morning” anti-pattern.

Indeed, many teams at some point opt for that option and run these long running tests at night , possibly in a separate “nightly” pipeline which is different from the “continuous integration” pipeline.

So now we might have the worst of both worlds:

  • The pipeline that really determines the go-no-go delivery is the nightly pipeline. Which means 12-24 hours of feedback loop on the “real truth” about the status of the code (“come back tomorrow morning to see if your code broke anything”) .

  • Developers might get a false sense of confidence from just the CI (continuous integration) pipeline.

  • To know if you can deliver, you now have to look at two different locations (and it might also mean different people or departments are looking at each board instead of the whole team)

Can we get high confidence with fast feedback? Close.

What if we try to get the confidence of e2e tests but still get some of the nice fast feedback that unit tests give us? Let’s try this scheme on for size:

  • Given Feature 1

  • We can test the standard scenario 1.1 as an e2e test

  • For any variation on that scenario, we always write that variation in either a unit test or an integration test (faster feedback).

Now we get both the “top to bottom” confidence and the added confidence that more complicated variations in the logic are also tested in fast tests at a lower level. Here’s how that might end up looking:

Based on this strategy, here’s a simple tactical maneuver that I’ve found helpful in various teams I’ve consulted with:

Before starting to code a feature or a user story, the developer sits with another person to create a “Test Recipe”. That other person could be another developer, a QA person assigned ot the feature (if that’s your working process), an architect, or anyone else you’d feel comfortable discussing testing ideas with.

A Test Recipe as I like to call it is simply a list of the various tests that we think might make sense for this particular feature or user story, including at which level each test scenario will be located in.

A test recipe is NOT:

  • A list of manual tests

  • A binding commitment

  • A list of test cases in a test planning piece of software

  • A public report, user story or any other kind of promise to a stakeholder.

  • A complete and exhaustive list of all test permutations and possibilities

At its core it’s a simple list of 5-20 lines of text, detailing simple scenarios to be tested in an automated fashion and at what level. The list can be changed, added to or detracted from. Consider it a “comment”.

I usually like to just put it right there as a bottom comment in the user story or feature in JIRA or whatever program you’re using.

Here’s what it might look like:

Just before coding a feature or a user story, sit down with another person and try to come up with various scenarios to be tested, and discuss at which level that scenario would be better off tested.

This meeting is usually no longer than 5-15 minutes, and after it, coding begins, including the writing of the tests (if you’re doing TDD, you’d start with a test first).

In orgs where there are automation or QA roles, the developer will take on writing the lower level tests and the automation expert (Yes QA without automation abilities wil need to learn them slowly) will focus on writing the higher-level tests in parallel, while coding of the feature is taking place.

If you are working with feature toggles, then those feature toggles will also be checked as part of the tests, so that if a feature if off, it’s tests will not run.

Simple rules for a test recipe

  1. Faster. Prefer writing tests at lower levels

    • Unless a high level test is the only way for you to gain confidence that the feature works

    • Unless there is no way to write a low level test for this scenario that makes you feel comfortable

  2. Confidence. The recipe is done when you can tell yourself “If all these tests passed, I’m feel pretty good about this feature working.” If you can’t say that, write more scenarios that would make you say that.

  3. Revise: Feel free to add or remove tests from the list as you code, just make sure to notify the other person you sat with.

  4. Just in time: Write this recipe just before starting to code, when you know who is going to code it, and coding is about to start the same day.

  5. Pair. Don’t write it alone if you can help it. Two people think in different ways and it’s important to talk through the scenarios and learn from each other about testing ideas and mindset.

  6. Don’t repeat yourself. If this scenario is already covered by an existing tests (perhaps from a previous feature), there is no need to repeat this scenario at that level (usually e2e).

  7. Don’t repeat yourself again. Try not to repeat the same scenario at multiple levels. If you’re checking a successful login at the e2e level, lower level tests will only check variations of that scenario (logging in with different providers, unsuccessful login results etc..).

  8. More, faster. A good rule of thumb is that to end up with a ratio of at least 1 to five between each level (for one e2e test you might end up with 5 or more lower level other scenario tests)

  9. Pragmatic. Don’t feel the need to write at test at all levels. For some features or user stories you might only have unit tests. For others you might only have API or e2e tests. As long as you don’t repeat scenarios, and as long as you

Using this strategy, we can gain several things we might not have considered:

Faster feedback

OK, that one we did consider, but it’s the most notable.

Single delivery pipeline

If we can get the tests to run fast, we can stop running e2e tests at night and start putting them as part of the CI pipeline that runs on each commit. That’s real feedback for developers when they need it.

Less test duplication

We don’t waste time and effort writing and running the same test in multiple layers, or maintaining it multiple times if it breaks.

Knowledge sharing

Because test recipes are done in pairs, there is much better knowledge sharing and caring about the tests, especially if you have a separate QA department. Test recipes will “force” devs and QA to have real conversations about mindset and testing scenarios and come up with a better, smaller, faster plan to gain confidence, together. Real teamwork is closer.


Hope you find this useful. Feel free to drop me a line on twitter @royosherove or email me roy AT 5whys dot com (or comment on this post!)


Test strategies aren’t usually enough when you already have a large=ish list of e2e long running tests. You’ll still have to optimize the pipelines. There are many ways, but a rather simple one (given enough build agents) is to parallelize the steps in the the pipelines. Here’s a simple example of that: