Journalists at The Economist have worked with data since our very first issue in 1843. As ever, reporters back then were especially concerned with coffee:
While we take care to identify our sources, we have not often published the data behind them. Sometimes, this is for good reason: some data are proprietary or otherwise not ours to publish. Often, we have simply not made the time to do it. This is a shame: releasing data can give our readers extra confidence in our work, and allows researchers and other journalists to check — and to build upon — our work. So we’re looking to change this, and publish more of our data on GitHub.
Years ago, “data” generally meant a table in Excel, or possibly even a line or bar chart to trace in a graphics program. Today, data often take the form of large CSV files, and we frequently do analysis, transformation, and plotting in R or Python to produce our stories. We assemble more data ourselves, by compiling publicly available datasets or scraping data from websites, than we used to. We are also making more use of statistical modelling. All this means we have a lot more data that we can share — and a lot more data worth sharing.
We decided to make the Big Mac index our first open data project. It is an ideal fit: the index is based on publicly available data, and it involves some work on our part to turn those data into a final product. Also, the index is intended as an easily-digestible introduction to how relative currency valuation works, so exposing its inner workings is a natural step. Although we have done that in words before, publishing code gives readers a more concrete way of seeing how the calculations work. The Big Mac index has often been imitated — for example, there is a Billy Bookcase index and a Spotify subscription index. Releasing our code will make it easier for people to remix the index.
We started calculating the Big Mac index in 1986, and until this year it has been compiled and calculated manually. There are still a lot of hands-on parts to the process — in particular, compiling the list of prices — but we have now converted much of the calculation into code. We published these calculations in a Jupyter notebook, a tidy format for breaking scripts into small blocks and annotating them.
In addition, while we’ve always released the index values in an Excel spreadsheet, we provide the base numbers behind the index (prices, GDP, and exchange rates), as well as data, in a CSV. This format is more useful for researchers using statistical programming languages. All of the data and scripts are now available on GitHub. Go check them out and remix the Big Mac index to your heart’s content!
We plan to publish more of our data on GitHub in the coming months—and, where it’s appropriate, the analysis and code behind them as well. We look forward to seeing how our readers use and build upon the data reporting we do. If you have thoughts on the way we’ve released these data, or how we should go about releasing more in the future, please let us know.
Evan Hensleigh is a visual journalist at The Economist.