Using Google Maps Distance Matrix API to Create a Distance Table

By Juan Acosta

Juan Acosta

In the summer of 2021 I worked on a logistics project for a civil engineering company in which I was tasked with finding distances between warehouses and delivery points. Considering there were hundreds of delivery points, using Google Maps manually to find the distance from each origin to multiple destinations was going to be extremely time consuming. Fortunately, I came across the Google Maps APIs and with some Python and SQL knowledge, I was able to build a program to more efficiently generate the needed distance data.

You can find the GitHub repository for this project and follow along by clicking here.

Introduction to Google Maps APIs:

In order to use Google APIs you need to enter billing information. However, Google offers plenty of free usage before charging a fee. If you are using Google APIs as learning tools or for personal projects, you will likely not need to pay. Google has set up a pay-as-you-go system and there is a $200/month credit available. Here is a chart showing their API pricing system:

Taken from Distance Matrix API Usage and Billing (Oct, 2021)

You can find more detailed, current billing information here.

In order to use the API, you are going to need to obtain an API key. Your API keys can be found at https://console.cloud.google.com/apis/credentials. Without your API keys you will not be able to access API services; it is important to remember how to find them. For this project we are going to add an extra layer of security and restrict our API key by IP address.

To add a restriction by IP address, click on your API key’s edit button via the above link. Once clicked, go to “Application restrictions” and select “IP addresses”. In the “Accept requests from these server IP addresses” section you will click “ADD AN ITEM”, enter your IP address, and click the “DONE” button. Finally, you will navigate to the bottom of the screen and click the “SAVE” button. Now your API key can only be accessed by devices that share your IP address. Keep in mind that you can change these settings at anytime.

Once you have created a Google Cloud Platform account, entered your billing information and gained access to your API key, you are ready to use the Distance Matrix API. Now let’s move on to the coding section!

Requesting and Parsing Data:

In the first cell of our notebook, we will load our imports:

“requests” is a library that allows us to send HTTP requests. The “pprint” module is quite similar to the regular “print” function in Python, but will make our data structures look tidier and more readable. We will use Pandas to visualize, manipulate and analyze our data. The “os” module provides functions to interact with the operating system using Python. SQLite3 is the relational database management system (RDBMS) that will help us store our data. You can certainly use a different RDBMS if you wish, but for a small project, SQLite provides enough tools. More on SQLite later.

Setting Up the Endpoint:

Notice the text <outputFormat> in our url variable. This text will be replaced with either json or xml to indicate the output format of our data. We want our response to be in JSON format, so we will replace <outputFormat> with json.

There are two required query parameters: origins and destinations. You can add additional parameters such as arrival_time, departure_time, and mode(driving, biking, walking). For our purposes, we will be using origins, destinations, and mode. Each query parameter will be separated by an ampersand (&).

The destinations and origins parameters are formatted the same way. Each location is going to be separated by the pipe character (|) and they can be in the form of place IDs, addresses, or coordinates. Google recommends using place IDs, so we will stick to that. Here is their explanation in case you are wondering why:

Taken from Distance Matrix API overview

A place ID is a way to identify a location from the Google Places database. A place ID finder is provided by Google; you can use it by following this link.

An API request is going to have a limit of 25 origins and destinations, but for the sake of simplicity, we are going to limit ourselves to two origins and two destinations. New York City and Philadelphia will be the origins; St. Louis and Memphis will be the destinations.

Finding a place in the Place ID finder looks like this:

Taken from Place ID finder

Since the url can get quite long when adding several origins and destinations, we will be storing our origins and destinations in origin and destination variables. These variables will store, as strings, the data from text files “origin_ids.txt” and “destination_ids.txt”. Make sure these text files are stored in the same folder as your project. At this point, the second cell of our Jupyter Notebook should look like this:

The text files that are being read will look like this:

Notice the pipe character (|) separating the two values in each file. Now that the endpoint has two origins and two destinations, you can make an API call that will return the requested data.

After running the code from the cell above, you should get a reply in JSON format:

Let’s discuss this response’s structure. The rows key contains a list of dictionaries. Each dictionary contains an elements key that contains a list representing the matrix data for origin/destination pairs. These elements keys are sorted in relation to the origin_addresses list. For example, the origin of the first elements key is New York and the second is Philadelphia. Each elements key contains a list with two items, sorted in relation to the 'destination_addresses' list. For example, St. Louis is the first item of each elements list and Memphis is the second.

It is also important that the status keys have a value of “OK”. In the instance of an error, a given status key may have a value of “INVALID_REQUEST”. Common reasons for errors are an obsolete place ID or a requested location with no road access.

Now that we are able to make requests to the Distance Matrix API, it is time to parse our data. For our example, with only two origins and two destinations, it is easy to identify the distances we need by just looking at our API response. However, when dealing with more origins and destinations, identifying meaningful data within the response can be quite difficult. To better work with larger amounts of data, we are going to extract just what we need from the response and put it in a readable distance table.

Parsing Data:

When the above dictionary is turned into a Pandas DataFrame and printed to the console like so:

It will look like:

If you compare this printed DataFrame to the response data in the prior section, you will notice that the first row of the DataFrame (index 0) displays the distances to St. Louis and the second row of the DataFrame (index 1) displays the distances to Memphis. Eventually those indices will be replaced by the destinations names, but that will be addressed later on.

Now that we know how a dictionary needs to be structured, we are going to need a sorting algorithm that will take the data in the API response and turn it into the dictionary we need. To do this, our parser will be constructed like so:

Let’s break it down:

If you reference the API function in the above section “Setting up the endpoint,” we have stored our API response in a variable output. We will store our output['origin_addresses'] and output['destination_addresses'] data in origins and destinations variables, respectively.

Next, we are going to create an empty dictionary called distance_matrix where we will store the parsed data that we will use to create our DataFrame.

Now comes the for-loop which is where the actual parsing is going to take place. We are going to start by iterating through the origins list. Within each iteration of this for-loop we will create an empty list called values that will store the distances between the current origin and its destinations.

The next line - for element in output[rows][origins.index(origin)]['elements']: - establishes a nested for-loop that will allow us to iterate through each 'elements' key within our response data. With each iteration of this nested for-loop, we will be able to access the distance (as a string) between our current origin and a destination. The four lines of code within this for-loop access the relevant distance value, check for and strip out commas (for distances ≥ 1,000), convert the string to a float, and push the value to our values list.

This nested for-loop will keep iterating through each element of the first 'elements' key and adding all its distances to the values list.

Once our nested for-loop has finished, the last line of code within the outer for-loop will update the distance_matrix dictionary by adding a key-value pair with the origin value as the key and the values list as the value: distance_matrix.update({origin: values}). The outer for-loop will then move to the next origin and the process will repeat itself.

A quick note regarding our nested for-loop: from a Big-O perspective, this results in a potentially problematic runtime of n². As referenced in the above section, “Setting Up the Endpoint,” our API call is limited to 25 origins and destinations which limits the runtime for a given batch of data. This is just something to keep in mind; in the context of this project, you will not experience performance issues. For more information on Big-O notation, here is a useful article.

Once our parser finishes running, we will have a distance_matrix dictionary with the desired structure. Pandas’ pd.DataFrame() method takes an “index” argument that allows us to replace a numeric index with an array of other values. We will pass our destinations list to this “index” argument, like so:

Remember, our destinations list that we are using as a custom index was derived from our API response’s destination_addresses key, which is properly ordered in relation to our parsed data.

To label the index column “destinations”, we can run the following: df.index.name = 'destinations'.

With that, we have extracted the data and can now visualize it as a DataFrame. Printing our df variable will now result in:

Storing Data in SQL Database:

SQLite is serverless, which makes it easy to set up; it is self-contained and handles data in a relatively simple way. It is a good option for low to medium traffic websites or for internal development purposes. However, it lacks multi-user capabilities, advanced security features and can have performance issues with large datasets. For the purpose of storing personal projects, it should be more than enough, but consider another RDBMS if you will be working with enterprise data or very large projects.

First, we are going to name our database and save it in a variable DB:

This DB variable will serve as the name of our .db file and so our string value must end with .db.

It is a good practice to wrap SQL commands in Try and Except blocks, so you can catch errors and clearly determine what they are. We are going to divide errors between database errors and other errors. Our Try-Except blocks will be structured like this:

Inside the try block, we will connect to and create our database using our DB variable and store our Pandas DataFrame — df — using the Pandas method to_sql():

While it is not necessary to use a with statement when connecting to the database, it is preferable since you do not have to worry about closing the database.

Panda’s to_sql method can be called on a DataFrame and requires certain parameters. In the above code, our first argument "distance_table" is the name of the table, the second argument, db, provides the database engine or connection and if_exists='append' will append the data if the database already exists. This last parameter is useful in our case if you have more than 25 origins or destinations to add. The Distance Matrix API will not allow more than that, so you might need to create multiple API calls, parse their data, and append them to the database one by one (see the “Bonus Info” section at the end of this article for more info on this).

After executing the code in this cell, you should see a file called “distance_table.db” in the same folder as your Jupyter Notebook. This file contains the distance table we created. To open it and see its content, I recommend downloading and using the program “DB Browser for SQLite”. It is a very simple, open-source program to create, edit, and visualize database files.

In “DB Browser for SQLite”, after opening the file, click on “Browse Data” and you should be able to see the table there:

Notice that the destinations are not treated as indices here. Instead, they have been assigned their own column.

Getting Data Back from Database to DataFrame:

Here we are connecting to the database the same way we connected when storing the data. We will need to create a cursor so we can use the execute command and within it, write the SQL statement that will select all the content of ‘distance_table’: "SELECT * FROM distance_table". The asterisk is used to indicate that it needs to select all the contents of the table.

In the next line, we are going to create a DataFrame named df2. In the first argument that we pass to the DataFrame method, we indicate that all the selected data from the database is going to be turned into a DataFrame by using cursor.fetchall(). In the second argument we will extract the column names from the database and assign them to our DataFrame. Without this parameter, we would get a DataFrame with no column names.

Once you run this code, the df2 DataFrame will look like this:

With this information, you can now create your own distance table that will help you in whatever logistics projects you might have.

I hope you have found this article both interesting and useful! Please leave any comments or questions you might have in the comments section below.

Bonus Info:

  • In the Jupyter Notebook, you will find a cell with a function called add_location. This function will save time at the moment of adding origins and destinations to the API call. You will just need two arguments: the first one is the name of the text file where you will be adding the location and the second one is the location (which can be in the format of an address, coordinates or place ID). Both arguments must be strings. If the file does not exist, the function will create it and add the location. If the file does exist, it will add the location and make sure it is separated by a pipe (|). Additionally, if you reach the limit of 25 locations, the function will notify you and will not allow you to keep adding locations to that file since the API call would not accept more than 25 origins or destinations.