Monte Carlo Forecasting in Software Delivery | Expedia Group Technology

By Bart Masters

Bart Masters
Casino chips on a roulette betting table
Software forecasting shouldn’t be a gamble. Photo by Kay on Unsplash

“When is it going to be ready?” We’re often asked that at Expedia Group™, and one way we answer it is scientifically — with Monte Carlo forecasting.

Decorative separator

What’s wrong with just using velocity?

However, it is crude — it assumes your velocity and backlog sizes will remain consistent, and it provides a hard end date when your work will be finished. Velocity often changes from sprint to sprint, and the backlog will often change in size. And the chance of hitting a specific date months in the future is negligible. We need an approach that reflects the unpredictability of the future.

Enter Monte Carlo forecasting

It becomes useful in software by taking a range of historical data (completed story points or user stories per sprint) and a range of outstanding work (a product backlog). It can then provide a probability-based forecast on the date range when the work will be complete.

Monte Carlo run-through and attribution

One tool created by Focused Objective is this Excel spreadsheet, which is a great tool for running Monte Carlo forecasts. I use it extensively here at Expedia Group, and it is the core of this demo.

Step 1: Collect historic data

I start by collecting completed story point stats for the past seven sprints and enter them into the spreadsheet on the Throughput Samples tab. That's the 11, 9, 19, and so on in the orange cells.

If you don’t use story points or sprints, you can still use Monte Carlo. You could use number of stories per month, number of JIRA tickets per week, or whatever you have as a regular cadence.

Inputting historic data into the spreadsheet

Step 2: Input your outstanding work

In this scenario, I’m doing some planning for the team for the next quarter. So I plug in the scenario data:

  • Start Date — March 13th.
  • Low and high range for the number of story points in your scenario. In this scenario, we are estimating 80–90 story points need to be worked on. This is an excellent spot for capturing any uncertainty you have with the amount of outstanding work.
  • Low and high range for how many stories are created or split during the work. In this example, I’m being pessimistic, and estimate that when the team comes to plan their sprints, in some cases they are going to have to double the amount of stories/story points. So we have an effective range of 80–180 story points for this scenario.
  • Length of delivery cadence, in this case, 2 weeks.
  • The last 2 orange squares are where you can put in guesses on your velocity, if you don’t have any historic data. I’ve found Monte Carlo not particularly useful without real data, so my recommendation is to hold off until you have at least 2–3 samples of useful data before trying any forecast.
Input the scenario

Step 3: Forecast!

Forecast dates for the scenario

The spreadsheet has run 500 times through a Monte Carlo simulation (more details on what this looks like next) to forecast how long it will take to complete the work in the scenario. Out of the 500 simulations, the work was completed by April 29th 5 times. 5 out of 500 is not great odds of the work being completed by that date.

The simulation had the work completing the week of May 13th 30 times. That means 30 + 5 = 35 out of 500 simulations, or 7% chance of the work being complete by May 13th. Still not great odds.

And the simulation keeps ongoing, you can see it gives you 50/50 odds of being complete on or before June 24th. 85% chance of completing by August 5th, and if things go pretty badly wrong, won’t be ready until September 30th.

So now you have a probability range about when the work will be complete. Do you want to take a gamble on the work being complete by June? Or take the safe route and plan for August?

Summarised forecast

Where did this forecast come from?

  • Randomly selects how many stories you need to build, based on the low and high estimate you gave it (in the example above, the range is 80–90).
  • Multiplies that by a random value within the range of story splitting you gave it (in the example, 1.0–2.0), which determines how many story points are needed to complete the scenario.
  • It then randomly picks one of the historic throughput samples you entered and uses that to burn down the required work for a sprint.
  • Then it selects another random throughput sample to burn down the required work for another sprint.
  • And so on, until the # of story points = 0. That determines how many sprints were required in this randomly-generated scenario, and thus a delivery date. One forecast is complete.
  • Repeat the above 499 times.
  • Group up the results, and you have your Monte Carlo forecast. 500 different randomly-generated simulations of your forecast scenario, grouped together to show the highest probability of completion dates.

What's next?

Decorative separator

I hope this been a useful intro to Monte Carlo forecasting — as before, I owe Troy Magennis a great debt for introducing me to this technique — focused objective has a lot more interesting tools and techniques for data-driven forecasting.

And if you have any other tips or tricks — please let me know below. Happy forecasting!