October 19, 2018 — Matt Chapman
In my last post, I wrote about my adventure of requesting metadata for both phone calls and emails from the City of Chicago Office of the Mayor. The work there - and its associated frustration - sent me down a path of sending requests throughout the US to both learn whether these sorts of problems are systemic (megaspoiler: they are) and to also start mapping communication across the United States. Since then, I’ve submitted over a hundred requests for email metadata across the United States – at least two per state.
The first large batch of requests for email metadata were sent to the largest cities of fourteen arbitrary states in a trial run of sorts. In the end of that batch, only two cities were willing to continue with the request - Houston and Seattle. Houston complied surprisingly quickly and snail mailed the metadata for 6m emails.
Seattle on the other hand...
On April 2, 2017 I sent this fairly boilerplate request to Seattle's IT department:
For all emails sent to/from any Seattle owned email address in 2017, please provide the following information: 1. From address 2. To address 3. bcc addresses 4. cc addresses 5. Time 6. Date
Technically this request can done with a single line powershell command. At a policy level, though, it usually gets a lot of pushback. Seattle's first response included a bit of gobsmackery that I’ve almost become used to:
Based on my preliminary research, there have been 5.5 million emails sent and 26.8 million emails received by seattle.gov email addresses in the past 90 days. This is a significant amount of records that will need to be reviewed prior to sharing them with you. Do you have a more targeted list of email addresses you might be interested in? If not, I will work to find out how long review will take and will be in touch.
Of course, I still want the records, so -
I would like to stick with this request as-is for all ~32m emails. Since this request is for metadata only, the amount of review needed should be relatively small.
Fee Estimate of 33 Million Dollars
A week later, I received this glorious response. Each paragraph is interesting in itself, so let’s break most of it down piece by piece.
Rewording of request
This acknowledges receipt of your public disclosure request C012032-041017 received on April 03, 2017 regarding: All emails sent to/from any Seattle owned email address in 2017 including metadata:1. From address2. To address3. bcc addresses4. cc addresses5. Time6. Date
Notice the change of language from the original wording. Their rewording completely changes the scope of the request so that it's not just for metadata, but also the emails contents. No idea why they did that.
With that being said, Seattle IT estimates spending 30 seconds to two minutes to review each email. It has been estimated that this work will take approximately 320 years of staff time at an expense of $33 million in salary.
Normally, a flustered public records officer would just reject a giant request for being for “unduly burdensome”… but this sort of estimate is practically unheard of. So much so that other FOIA nerds have told me that this is the second biggest request they've ever seen. The passive aggression is thick. Needless to say, it's not something I'm willing to pay for!
The estimate of 30s per metadata entry is also a bit suspect. Especially with the use of Excel, which would be useful for removing duplicates, etc.
We estimate that this request contains 8-10 terabytes of information, for which we could need to stand up an FTP server through which the requester will be able to download cleared email meta data. As allowed under RCW 42.56.120, we would charge the requester for the actual copying costs of fulfilling this request. Based on the Seattle IT cost model used for internal City charge backs, the anticipated cost to the requester is $2,480 per year plus $2.11 per gigabyte of storage. We are still working on the storage requirements for this effort. If we assume 10 TB of storage, this would require $21,606.40/year in requester fees.
Heh. Any sysadmin can tell you that The costs of storage doesn’t exactly come from the storage medium itself; administration costs, supporting hardware, etc, are the bulk of the costs. But come on, let’s be realistic here. There’s very little room for good faith in their cost estimates – especially since the last time a single gigabyte cost that much was between 2002 and 2004.
That said – some other interesting things going on here:
- Their file size estimation is huge. For comparison, that Houston’s email metadata dump was only 1.2GB.
- The fact that they mention “meta data” [sic] implies that they did acknowledge that the request was for metadata.
- Seattle already uses Amazon S3 to store public records requests’ data. At the time, S3 was charging $.023/GB
At this time, the City anticipates that it will be able to provide a first installment of records on or about May 29, 2017. However, please note that this time estimate may change depending on the clarification you provide and as we continue to process your request. If the City does not hear from you within 30 days, the City will consider your request closed.
Oddly, they don't actually close out the request and instead ask whether I wanted to continue or not. I responded to their amazing email by asking how many records I'd receive on May 29th, but never received an answer back.
Reversal of the Original Cost Estimate
On June 5, they sent a new response admitting that their initial fee estimation was wrong, and asked for $1.25 for two days’ (out of three months) of records:
At this point I can send you an exel spreadsheet with the data points you are requesting. The cost for the first installment is $1.25 for emails sent or received on January 1 and 2, 2017. Because the spreadsheet does not contain the body of the email and just the metadata that you requested, no review will be necessary, and we'll be able to get this information to you at a faster pace than the 320 years quoted you earlier.
Because they're asking for a single check for $1.25 for just two days’ worth of metadata – and wouldn’t send anything until that first check came in, my interpretation is that they’re taking a page from /r/maliciouscompliance and just making this request as painful as possible just for the simple sake of making it difficult. So in response, I preemptively sent them fourteen separate checks. The first thirteen checks were all around ~$1.25. That seemed to work, since they never asked for a single payment afterwards.
From there, my inbox went mostly silent for two months, and I mostly forgot about the request, though they eventually cashed all of the checks and made me an account for their public records portal.
Fast forward to August 22, when I randomly added that email account back to my phone. Unexpectedly, it turned out they actually finished the request! And without a bill for millions of dollars! Sure enough, their public records request portal had about 400 files available to download, which all in all contained metadata for about 32 million emails. Neat!
Problem though... they accidentally included the first 256 characters of all 32 million emails.
Here are some things I found in the emails:
- Usernames and passwords.
- Credit card numbers.
- Social security numbers and drivers licenses.
- Ongoing police investigations and arrest reports.
- Texts of cheating husbands to their lovers.
- FBI Investigations.
- Zabbix alerts.
In other words... they just leaked to me a massive dataset filled with intimately private information. In the process, they very likely broke many laws, including the Privacy Act of 1974 and many of WA's own public records laws. Frankly, I'm still at a loss of words.
It’s hard to say how any of this happened exactly, but odds are that a combination of request’s rewording and the original public records officer going on vacation led to a communication breakdown. I don’t want to dwell on the mistake itself, so I’ll stop it at that.
Side note to Seattle's IT department - clean up your disks. You shouldn't have that many disks at 100%!
Raising the Issue
I responded as passively as possible in the hopes that they’d catch their mistake on their own:
The responsive records are not consistent with my request and includes much more info than I initially requested. Could you please revisit this request and provide the records responsive to my initial request?
The information that you requestedis located in columns: From address = column J To address = column K bcc address = column M cc address = column L Time and date = column R of the reports. The records were generatedfrom a system report and I am unable to limit the report to generate only thefields you requested. The City has no duty to create a record that doesnot exist. As such, we have provided all records responsive to yourrequest and consider your request closed.
Disregarding the fact that they used a very common tactic of denying information on the basis that its disclosure would require the creation of new records… they didn’t get the point. I explained what information they leaked, and made it very clear how I was going to escalate this:
Please address this matter as if it was a large data breach. For now, I will be raising this matter to the WA Office of Privacy and Data Protection. None of the files provided to me have been shared with anyone else, nor do I have any future intention of sharing.
Thank you for your email and bringing this inadvertent error to our attention so quickly. We have temporarily suspended access to GovQA while we look into the cause of this issue. We are also working on reprocessing your request and anticipate providing you with corrected copies of the records you requested through GovQA next week. In the meantime, please do not review, share, copy or otherwise use these records for any purpose. We are sorry for any inconvenience.
Not too long after that, after contacting some folks on Seattle’s Open Data , I found my way onto a conference phone call with both Seattle’s Chief Technology officer and their Chief Privacy Officer and we discussed what happened, and what should happen with the records. They thanked me for bringing the situation to their attention and all that, but the mood of the call was as if both parties had a knife behind their back. Somewhere towards the end of the call, I asked them if it was okay to keep the emails. Why not at least ask, right?
Funny enough, in the middle of that question, my internet died and interrupted the call for the first time in the six months I lived in that house. Odd. It came back ten minutes later, and I dialed back into the conference line, but the mood of the call pretty much 180’d. They told me:
- All files were to be deleted.
- Seattle would hire Kroll to scan my hard drives to prove deletion.
- Agreeing to #1 and #2 would give me full legal indemnification.
This isn't something I'm even remotely cool with, so we ended the call a couple minutes later, and agreed to have our lawyers speak going forward.
Deleting the Files
After that call, I asked my lawyer to reach out to their lawyer and was pretty much told that Seattle was approaching the problem as if they were pursuing Computer Fraud And Abuse (CFAA) charges. For information that they sent. Jiminey Cricket..
So, I deleted the files.
Most of what happened next over a month or so was mostly between their lawyer and mine, so there’s not really that much for me to say. Early on I suggested that I write an affidavit that explains what happened, how I deleted the files, and I validated that the files were deleted. They mostly agreed, but still wanted to throw some silly assurance things my way – including asking me to run a bash script to overwrite any unused disk space with random bits. I eventually ran
fstrim instead, and they accepted the affidavit. No more legal threats from there.
About a week after the phone call, a Seattle city employee contacted Seattle’s KIRO7 about the incident. In KIRO7's investigation, they learned that, Seattle hadn’t sent any disclosure of the leak - something required by WA’s public records request law. Only after their investigation did Seattle actually notify its employees about the emails leak. Link to their story (video inside).
A week later, another article was published by Seattle’s Crosscut which goes into a lot of detail, including some history of Seattle's IT department. This line towards the bottom still makes me laugh a little:
The buffer against potential legal and administrative chaos in this scenario is only that Chapman has turned out to be, as Armbruster described him, a "good Samaritan." Efforts to track down Chapman were not successful; Crosscut contacted several Matthew Chapmans who denied being the requester.
On January 19th, Seattle's CTO, Michael Mattmiller gave his resignation. Whether his resignation is related to the email leak is hard to say, but I just think the timing makes it worth mentioning.
Finally – The Metadata
Starting January 26th, Seattle started sending installments of the email metadata I requested. So far they've sent 27 million emails. As of the writing of this post, there are only two departments who haven’t provided their email metadata: the Police Department and Human Services.
You can download the raw data here.
Some things about the dataset:
- It’s very messy – triple quotes, semicolons, commas, oh my.
- There are a millions of systems alerts.
- For seattle.gov → seattle.gov communication, there are two distinct metadata records.
In any case, it's still somewhat workable, so I've been working on a proof of concept for its use in the greater context of public records laws. Not ready to talk much about it yet, so here's here's is a gephi graph of one day's worth of metadata. Its layout is Yifan Hu and filtered with a k-core minimum of 5 and a minimum degree of 5:
Please reach out to me if you'd like to help model these networks.
One Last Thing: Legislative Immunity Kerfuffle
This last section might not be related, but the timing is interesting, so I feel it’s worth mentioning.
On February 23 - between the first installment of email metadata and the second - WA’s legislature attempted to pass SB6617, a bill which removes requirements for disclosure of many of their records – including email exchanges - from WA’s public records laws. What’s particularly interesting about this events of this bill is that it took less than 24 hours from the time it was read for the first time to the time that it passed at both the House and Senate and sent to the Governor’s office.
Seattle Times wrote a good article about it.
Thankfully, after the WA governor’s office received over 6,300 phone calls, 100 letters, and over 12,500 emails, the governor ended up vetoing the bill. Neat.
It's hard to say if that caused any sort of delay, but after a month and a half of waiting:
How are the installments looking? I saw that there was some recent legislative immunity kerfuffle around emails. Is that related to any delays?
And got this response:
Good news. The recent Washington state legislative immunity kerfuffle will not impact your installments. We have fixed the bug that was impacting our progress and are now on our way. In fact, I'll have more records for you this week.
A month later, they started sending the rest.
The work done throughout this post has led to a massive trove of information that ought to be enormously useful in understanding the dynamics of one the US's biggest cities. A big hope in making this sort of information available to the public is that it will help in changing the dynamic of understanding what sorts of information is accessible.
That said, this is just one city of many which have given me email metadata. As more of it comes through, I’ll be able to map out more and more, but the difficulty in requesting those records continues to get in the way.
Once I get some of these bigger stories out of the way, I’ll start writing fewer stories and write more about public records requesting fundamentals – particularly for digital records.
Next post will be about my ongoing suit against the White House OMB for email metadata from January 2017. This past Wednesday was the first court date - where the defendent's counsel never showed up.
Hope you enjoyed!