I grew up in Los Angeles and my parents would make a big deal out of Thanksgiving each year, even when it was just me and my sister, which I didn’t fully appreciate what a nice tradition they had created until I left home. There’s something about being able to rely on one thing each year that stays the same and grounds you, no matter what changes are happening in your life. So I would try to fly home for Thanksgiving every year and now that I have a kid I can convince my parents to fly to me, in San Francisco.
But the biggest constant, the thing that I can always rely on, is that we will argue about when is the best time to fly for Thanksgiving. My parents like to fly on Thanksgiving day because they think it’s less crowded, but I’ve also tried to go to IKEA on Superbowl Sunday and I’ve decided that whenever you think you can outsmart other people that also think they’re smarter than other people, you all end up at the same place.
This year my parents informed me that they’re flying in on Thanksgiving day and leaving on Saturday. This seems absurd to me, but it begs the question:
When is the best day to fly for Thanksgiving? And when is the worst day?
Let’s start with a hypothesis. My hypothesis is that when someone is stuck at SFO because of weather or flight delays, they get on their WiFi and start downloading podcasts, or watching Netflix, or reading every page of every social networking site, and finally they reach a point of boredom where they check into SFO on Foursquare because it’s really important that your friends on Foursquare know that you’re stuck at SFO. Hopefully you’re not in that terminal with the bad food.
The SSID for the WiFi at SFO is called “#SFO FREE WIFI”. It’s named as if someone has a wireless hotspot in their backpack running on a battery pack trying to steal the login information for your bank.
What is that thing journalists do when they want to find out more information about something that tangentially involves the government? Oh right, they file a Freedom of Information Act request. So let’s do that.
I went to the website for Public Records Requests for San Francisco and started a request. The most plausible department is the Department of Technology (who knew San Francisco had a Department of Technology?) because they run the free WiFi for various locations in San Francisco.
On October 23rd at 12:36 AM, I wrote:
I’d like the total number of unique “#SFO FREE WIFI” users and sessions per year as well as with daily usage counts from January 2016 to October 2018.
I went to bed and woke up with my phone ringing at 7:38 AM. It was the Department of Technology. They wanted clarification on which WiFi network I was trying to gather information about and I explained that I wanted the airport WiFi statistics. They promised to put me in touch with the correct person at the Airport Commission.
The same day, at 12:41 PM, I got an e-mail from a Public Information Officer at SFO informing me that due to California Government Code Section 6253(c), I would be receiving the information I requested by November 2nd.
I looked up the code, it says:
Each agency, upon a request for a copy of records, shall, within 10 days from receipt of the request, determine whether the request, in whole or in part, seeks copies of disclosable public records in the possession of the agency and shall promptly notify the person making the request of the determination and the reasons therefor.
On November 2nd at 2:47 PM, the same official wrote back and attached documents related to my request, with no redactions. I received two PDFs, one of them named
Daily Usage PWiFi_Stats_Jan2016-Oct2018.pdf which looked like this:
This goes on for 39 pages.
I don’t know the backstory of why most FOIA requests end up with data being formatted in the worst way possible, but we can still work with this.
We can use a tool called ghostscript to extract the text of the PDF.
gs -sDEVICE=txtwrite -o output.txt Daily\ Usage\ PWiFi_Stats_Jan2016-Oct2018.pdf
This creates a text file that looks like this:
Date SSID Byte Used (GB) 1/1/2016 #SFO FREE WIFI 2,183.808344 1/1/2016 #SFO FREE 5GHZ WIFI 252.940409 1/2/2016 #SFO FREE 5GHZ WIFI 236.419117 1/2/2016 #SFO FREE WIFI 2,489.504165 1/3/2016 #SFO FREE 5GHZ WIFI 254.143388 1/3/2016 #SFO FREE WIFI 2,570.939450 1/4/2016 #SFO FREE 5GHZ WIFI 273.915061 1/4/2016 #SFO FREE WIFI 2,681.677290 1/5/2016 #SFO FREE 5GHZ WIFI 284.193112 1/5/2016 #SFO FREE WIFI 2,632.704824 1/6/2016 #SFO FREE 5GHZ WIFI 241.426526 1/6/2016 #SFO FREE WIFI 2,522.141663 1/7/2016 #SFO FREE WIFI 2,149.880636 1/7/2016 #SFO FREE 5GHZ WIFI 172.447330 ...
Next, we can write a regex that will extract the data we care about:
/([0-9]+\/[0-9]+\/[0-9]+)\s#SFO FREE (5GHZ )?WIFI\s+([0-9.,]+)/
and normalize the data, finally writing these fields into a CSV. Then we can make two separate CSVs by grepping for “FREE WIFI” AND “FREE 5GHZ WIFI” and dropping the WiFi network, which will give us this:
1/1/2016,2183.808344 1/2/2016,2489.504165 1/3/2016,2570.939450 1/4/2016,2681.677290 1/5/2016,2632.704824 1/6/2016,2522.141663 1/7/2016,2149.880636
Now that we have a portable format, we can render it with Excel and display it as a chart to sanity check the results. Lastly, we’ll convert it to JSON and graph the data with a visualization library.
There’s a few things that stand out from this. First of all, it’s clear that there’s some bad or missing data from February to April 2017. And if we look closely, we can see the heaviest WiFi usage seems to be around Thanksgiving in 2017.
Let’s take a look at a calendar from last year.
Maybe the best thing to do is to look at a two week slice, starting on Nov. 19 and ending on Dec. 2.
So, what’s the lowest dip before or on Thanksgiving? It’s… Thanksgiving day. My parents are right, because they’re always right. And they’re leaving right before what we’ve identified as the absolute worst day to fly, not just for Thanksgiving but for the entire year, the Sunday after Thanksgiving.