Can You Make Swiss Trains Even More Punctual?

By Florian Laurent

crowdAI is running a data science challenge in collaboration with SBB, the Swiss national railway company, to optimize the scheduling of their trains.

Ever complained about late or too infrequent trains? Now is your chance to improve things!

Over 1.2 million people are transported every day on the Swiss railway network in over 10'000 trains with world-leading punctuality.

But traffic is expected to increase even further. As more trains are added, the scheduling complexity quickly explodes.

Your goal is to come up with ingenious ways to tackle this problem. Can you find a suitable algorithm? A powerful heuristic? A promising AI-approach?

You are given nine problems of increasing complexity describing a railway system, a list of trains and functional requirements. You need to come up with a viable schedule for each of them.

You can find the full documentation in the Starter Kit here. There’s a lot of information in there, but don’t get intimidated! We will start with an intuitive explanation of the problem, expose the requirements, and show how to submit a first simple submission.

By the end of this article you’ll be getting your first graded results

Each problem comes with 4 inputs:

  • a railway system,
  • a list of trains that run on it,
  • timing constraints, eg “this train needs to reach this place by this time”,
  • resource constraints, eg “only one of these sections can be used at the same time”.

Your goal is to find a schedule for all of the trains that will comply with all of the constraints. In some case it will be easy, in others it may be impossible.

Isn’t this backward?
It may be surprising to try to schedule trains while already having timing constraints. This is because we are considering cases where we want to make small modifications to a pre-existing schedule. 
For example if a major football match starts at 20:00, it would make sense to modify the usual schedule to have a train arrive close to the stadium no later than 19:00. Similarly, the departure of the train that brings the spectators back should have an earliest departure of maybe 22:30.

The network is modelled as a graph, in the graph theory sense of the word.
It is made of routes, themselves consisting of route sections. Each section is represented as an edge of the graph.

Our example railway network

We assume that each section can only be crossed in a single direction. We also assume that there are no loops in the network. As such, the resulting graph is a directed acyclic graph (DAG).

Trains are subject to different kind of timing requirements:

  • Earliest/latest arrival time, earliest/latest departure time and minimum stopping time, which allow passengers to get in and out of the train.
  • Connections to other trains, meaning that two trains must coordinate so passengers can pass from one to the other.

Let’s take a look at a sample problem: sample_scenario.json.

It contains two service intentions, which is a fancy name for “train including all its requirements”:

The JSON files describing the problems get very large. An extension such as JSONView can help you understand the simpler files. Beyond that a good way to visualise them is as tables: the one above was rendered using

Each train runs on a given route and must go through a number of route sections, referenced by their section markers. In this specific case there are no connections to deal with, but you can see examples of 3 other types of requirements: latest arrival time, earliest departure time and minimum stopping time.

Durations are expressed per ISO_8601 standard. The “P” in "PT3M" is the duration designator, which stands for period. The "T" stands for time, and "M" for minutes.

Easy so far! Let’s look at the first route.

This is actually a simple route, but representing a graph as a JSON structure makes it look cumbersome.

So this is route 111, on which our first train runs. Each of the five route paths represents a sequence of sections that follow each other. You can for example follow the sections of the first and longest route path, labelled 1, 4, 5, 6, 10, 13 and 14.

The other route paths are then “glued” on top of this first one using route_alternative_marker_at_entry and route_alternative_marker_at_exit as points of reference. For example the second path adds a single section that ends at marker M1, while the fourth path adds 3 sections that fork from M2 and reach another end.

Route 111 from sample_scenario.json

The red labels show the section markers that some sections bear. These are the markers referenced in the section requirements of each train. (The marker is shown attached to the end of the section it applies to.)

The second route in this problem looks exactly the same. It uses the same sections markers, as you can see below:

Section markers shown in red

Some markers are shared between the two routes. We would care about them if we had train connections to handle, as they would be of the form “this train should meet that train at this marker”. But in the current case we have no connections.

What we do care about are the resources shared between these routes, shown below in red circles (again the resources are shown attached to the end of the sections they apply to):

Resources shown in red

It is now obvious that these routes are tightly coupled together. In fact in the “real world” these tracks are running next to each other.

Resources are “locks” that prevent groups of sections from being used at the same time. Each resource also needs some release time: an amount of time that must pass between their release by one train and the following occupation by the next train.

Release time for each resource
01_dummy.json routes and resources

Now that we have an intuition of how basic routes look like, let’s check out a less trivial problem: 01_dummy.json.

We see that in a lot of cases a resource is used by consecutive sections and shared by adjacent routes. This corresponds to the fact that trains typically need a wave of several “green signals” ahead of them.

01_dummy.json close-up

These graphs are real samples from the Swiss network!

You can look up the names of the markers to figure out where the tracks are running in the real world.

In the case above, we can recognize the tracks between the cities of Zürich, Zug and Pfäffikon.

To implement a concrete solution follow this document that will guide you through solving the sample_scenario.json problem by hand.

The worked out solution