Modelling the Capacity of London Waterloo



Designing the Logic for Waterloo

The first step when constructing any simulation is to decide where the boundary conditions of the simulation are. What do I mean by boundary conditions? This is where we decide where to start and where to stop modelling. After all, Waterloo does not exist in isolation. We could continue to model the rest of the track towards Clapham Junction and beyond. We could try to model the entire railway system of the UK.

However, as models become more complex, their accuracy tends to also decrease. This is because models are always an abstraction and an approximation of reality. The simpler the better. We can achieve simplicity through defining our boundary conditions as narrow as possible.

A good rule of thumb is to think about the question that is being asked, this will reveal where the central focus of the simulation is. Then expand outwards in scope from this central point, continually asking whether inclusion will help answer the original question.

In the case of London Waterloo station, the question we are trying to answer is “what is the capacity of this station in terms of trains per hour?” We therefore want to model trains running in and out of the platforms. When a train is running out, we need to decide on a point to stop modelling. A good boundary condition on the train run out would be the point at which the train is clear from Waterloo to allow the next train to run in to the platform.

So our system logic with the boundary conditions looks like this:

Screen Shot 2017-09-26 at 22.42.07

Gathering Data

Since there are 24 platforms at Waterloo, in this simple model we can create a single resource representing all the platforms. We can give this resource a capacity of 24.

waterloo_platforms = simpy.Resource(capacity=24)

Note: In the real world with a commercial study looking at capacity at London Waterloo, the only difference would be that we would model each platform as its own individual resource. This is because for a commercial study we require higher levels of accuracy, so we would need to model each run in and run out time for all 24 platforms individually. For the purposes of this example though a single resource will do!

In the model, as in real life, there will only be a certain number of trains allowed to run in at any one time. This can be approximated by using two more resources, one for the run in authority and one for the run out authority. For Waterloo we can assume that up to three trains can be running in at the same time and up to three trains can be running out.

run_in_authority = simpy.Resource(env, capacity=3)
run_out_authority = simpy.Resource(env, capacity=3)

For the amount of time which a train takes to run in or run out of the platforms at Waterloo – in the absence of data we need to make some crude assumptions.

Here is a satellite image taken from Google Maps of London Waterloo:

Screen Shot 2017-09-26 at 22.42.14

Google maps has an absolutely brilliant feature for measuring distances. To use it right click on the map and select measure. With the measuring tool we can get some estimates for how far we think we need to model trains running in and out of London Waterloo.

We can make some assumptions about where the run in starts and ends in the model. This is the boundary condition mentioned earlier.

Screen Shot 2017-09-26 at 22.42.23

The maximum distance that a train needs to run in from this point is 801m and the minimum is 749m, as estimated with the Google Maps measurement tool. Since we know that trains need to be organised further down the line so that they are in the correct position, we can safely add 200m onto these measurements. So we have a maximum distance of 1001m and a minimum of 949m.

If we assume that the average train speed running into Waterloo from the boundary condition, factoring in the need to come to a stop, is 30km/h, or 8 meters per second in engineering speak.This means that we can calculate the minimum and maximum time it takes a train to run in and run out from a platform.

Minimum time = 949 / 8 = 119s
Maximum time = 1001 / 8 = 125s

We can therefore employ our uniform distribution to sample between these two times randomly.

As for the passengers boarding and alighting, as well as the driver needing to change ends of the train, this is more of a stab in the dark. We shall assume a triangular distribution for this process, with a minimum time of 180s, a maximum of 600s and a most likely of 300s.

Building the Code

Putting this all together, the complete code looks as follows:

# import simpy for discrete-event simulation
import simpy
# import random for sampling random numbers
from numpy import random
# import numpy for maths and other useful things
import numpy as np
# import pandas for data analysis
import pandas as pd

# create an empty dictionary to populate with data
output = {'Train':[], 'Departure Time':[]}

# create a list of the platforms
free_platforms = list(range(1,25))

# define the triangular distribution
def triangular_distribution(minimum, maximum, median):
    x = random.triangular(minimum, median, maximum)
    return x 

# define the uniform distribution
def uniform_distribution(minimum, maximum):
    x = random.uniform(minimum, maximum)
    return x

# define the source
def source(env):
    i = 1
    while True:
        # start the process
        yield env.timeout(10)
        env.process(run_in_run_out_waterloo(env, waterloo_platforms, i))
        i += 1

# describe the process
def run_in_run_out_waterloo(env, waterloo_platforms, name):
    # global is used so that this function can modify the platform list
    global free_platforms

    # request resource
    platform_request = waterloo_platforms.request()
    yield platform_request

    # pick one of the free platforms
    platform_choice = random.choice(free_platforms)

    # request run in authority
    print("%ds: Train %d requesting run into platform %d" % (, name, platform_choice))
    run_in_request = run_in_authority.request()
    yield run_in_request

    # run in
    print("%ds: Train %d running into platform %d" % (, name, platform_choice))
    yield env.timeout(uniform_distribution(94,100))
    print("%ds: Train %d arrived at platform %d and boarding" % (, name, platform_choice))

    # release run in authority

    # board passengers
    yield env.timeout(triangular_distribution(180, 600, 300))
    print("%ds: Train %d fully-boarded and requesting to depart platform %d" % (, name, platform_choice))

    # request run out authority
    run_out_request = run_out_authority.request()
    yield run_out_request

    # run out
    output['Departure Time'].append(
    print("%ds: Train %d running out of platform %d" % (, name, platform_choice))
    yield env.timeout(uniform_distribution(119,125))
    print("%ds: Train %d has left and platform %d is now free" % (, name, platform_choice))

    # release resource

    # add platform back into list of free platforms

    # release run out authority

# create the simpy environment
env = simpy.Environment()

# define the resources
waterloo_platforms = simpy.Resource(env, capacity=24)
run_in_authority = simpy.Resource(env, capacity=3)
run_out_authority = simpy.Resource(env, capacity=3)

# start the source process

# run the process = 300)

# convert the dictionary to a csv
df = pd.DataFrame.from_dict(output, orient = 'columns')

# write the data to a csv file
df.to_csv('output.csv', index = False)

Analysing the Data

Running this code for a short sample 300s gives the following output:

10s: Train 1 requesting run into platform 9
10s: Train 1 running into platform 9
20s: Train 2 requesting run into platform 24
20s: Train 2 running into platform 24
30s: Train 3 requesting run into platform 13
30s: Train 3 running into platform 13
40s: Train 4 requesting run into platform 14
50s: Train 5 requesting run into platform 23
60s: Train 6 requesting run into platform 16
70s: Train 7 requesting run into platform 11
80s: Train 8 requesting run into platform 7
90s: Train 9 requesting run into platform 19
100s: Train 10 requesting run into platform 12
107s: Train 1 arrived at platform 9 and boarding
107s: Train 4 running into platform 14
110s: Train 11 requesting run into platform 15
118s: Train 2 arrived at platform 24 and boarding
118s: Train 5 running into platform 23
120s: Train 12 requesting run into platform 4
129s: Train 3 arrived at platform 13 and boarding
129s: Train 6 running into platform 16
130s: Train 13 requesting run into platform 18
140s: Train 14 requesting run into platform 21
150s: Train 15 requesting run into platform 5
160s: Train 16 requesting run into platform 22
170s: Train 17 requesting run into platform 1
180s: Train 18 requesting run into platform 20
190s: Train 19 requesting run into platform 17
200s: Train 20 requesting run into platform 6
207s: Train 4 arrived at platform 14 and boarding
207s: Train 7 running into platform 11
210s: Train 21 requesting run into platform 8
214s: Train 5 arrived at platform 23 and boarding
214s: Train 8 running into platform 7
220s: Train 22 requesting run into platform 10
227s: Train 6 arrived at platform 16 and boarding
227s: Train 9 running into platform 19
230s: Train 23 requesting run into platform 2
240s: Train 24 requesting run into platform 3

However, what we are really interested in is the total capacity at Waterloo, in order to calculate this we need to run the simulation for a longer period of time. Let;s run the simulation for 3 hours.

The times which trains were dispatched from Waterloo was recorded in the simulation and saved to a csv file. The total number of trains which have been dispatched over time can be plotted to visualise the performance at Waterloo:

Screen Shot 2017-09-26 at 22.42.34

While this looks fairly steady state, if we zoom in to the first 20 minutes, we can see a degree of variability in the dispatch times.

Screen Shot 2017-09-26 at 22.42.41

So what is the conclusion here? Our original question asked to find out what the technical capacity is at London Waterloo. If we are successfully dispatching 255 trains over a three hour period, then this puts the technical capacity at Waterloo as a peak of 85 trains per hour. Of course, this is with the crude assumptions that we have made. However this example demonstrates the potential power of what is a relatively simple model, how it can be used to model large systems and answer big picture questions.


This blog post uses data that is entirely made up by the author and in no way represents real world information about London Waterloo train station. This blog post in no way is affiliated with Transport for London or Network Rail, it has been created in the author’s own time for educational purposes only and the results do not have any real world relevance.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s