Last year, we launched an investigation into how Facebook’s People You May Know tool makes its creepily accurate recommendations. By November, we had it mostly figured out: Facebook has nearly limitless access to all the phone numbers, email addresses, home addresses, and social media handles most people on Earth have ever used. That, plus its deep mining of people’s messaging behavior on Android, means it can make surprisingly insightful observations about who you know in real life—even if it’s wrong about your desire to be “friends” with them on Facebook.
In order to help conduct this investigation, we built a tool to keep track of the people Facebook thinks you know. Called the PYMK Inspector, it captures every recommendation made to a user for however long they want to run the tool. It’s how one of us discovered Facebook had linked us with an unknown relative. In January, after hiring a third party to do a security review of the tool, we released it publicly on Github for users who wanted to study their own People You May Know recommendations. Volunteers who downloaded the tool helped us explore whether you’ll show up in someone’s People You Know after you look at their profile. (Good news for Facebook stalkers: Our experiment found you won’t be recommended as a friend just based on looking at someone’s profile.)
Facebook wasn’t happy about the tool.
The day after we released it, a Facebook spokesperson reached out asking to chat about it, and then told us that the tool violated Facebook’s terms of service, because it asked users to give it their username and password so that it could sign in on their behalf. Facebook’s TOS states that, “You will not solicit login information or access an account belonging to someone else.” They said we would need to shut down the tool (which was impossible because it’s an open source tool) and delete any data we collected (which was also impossible because the information was stored on individual users’ computers; we weren’t collecting it centrally).
We argued that we weren’t seeking access to users’ accounts or collecting any information from them; we had just given users a tool to log into their own accounts on their own behalf, to collect information they wanted collected, which was then stored on their own computers. Facebook disagreed and escalated the conversation to their head of policy for Facebook’s Platform, who said they didn’t want users entering their Facebook credentials anywhere that wasn’t an official Facebook site—because anything else is bad security hygiene and could open users up to phishing attacks. She said we needed to take our tool off Github within a week.
We started to worry at this point about what the ramifications might be for keeping the tool available. Would they kick us off Facebook? Would they kick Gizmodo off Facebook? Would they sue us?
We decided to change the tool slightly so that it directed users to a Facebook sign-in page to log in, and then used session cookies to keep logging in each day and checking the PYMK recommendations. Facebook, though, still disapproved, and said they had another problem with the tool.
“I discussed the general concept of the PYMK inspector with the team with respect to whether it is possible to build the inspector in a policy compliant manner and our engineers confirmed that our Platform does not support this,” wrote Allison Hendrix, the head of policy for Facebook’s Platform, by email in February. “We don’t expose this information via our API and we don’t allow accessing or collecting data from Facebook using automated means.”
In other words, Facebook doesn’t have an official way for people to keep track of their PYMK recommendations and that means users aren’t allowed to do it. Facebook is happy to have users hand over lots of data about themselves, but doesn’t like it when the data flows in the other direction.
Shortly thereafter, in March, Facebook’s world exploded, when it was revealed that Cambridge Analytica had gotten access to the profile information of millions of Facebook users, going through what was considered an “official route” in 2012. Facebook stopped bothering us about our PYMK Inspector, and the tool currently remains up.
“We often work with developers to address concerns about their apps and tools; especially if they are found to violate our terms. We contacted Gizmodo concerning their tool because it asked people to provide their Facebook login information and did so in a way that may have made them vulnerable to phishing attempts,” said Hendrix in an emailed statement when we asked for comment about the tool this week. “When people are encouraged to provide their Facebook information in a way that is different from what they’re used to, they might trust other, malicious forms and our terms attempt to prevent this. After extended conversations where we gave Gizmodo an opportunity to make updates; they decided to make the necessary changes. We understand that the tool is still live.”
The episode demonstrated a huge problem to us: Journalists need to probe technological platforms in order to understand how unseen and little understood algorithms influence the experiences of hundreds of millions of people—whether it’s to better understand creepy friend recommendations, to uncover the potential for discrimination in housing ads, to understand how the fake follower economy operates, or to see how social networks respond to imposter accounts. Yet journalistic projects that require scraping information from tech platforms or creating fictitious accounts generally violate these sites’ terms of service.
That’s why a team of lawyers at Knight First Amendment Institute at Columbia University has sent a letter to Facebook on behalf of Kashmir Hill of Gizmodo Media Group and other journalists and academic researchers asking Facebook to amend its terms of service to create a safe harbor for journalistic and research projects. That would mean journalists and researchers using automated means or fictitious accounts to gather data about Facebook and how it works for stories that serve the public interest won’t be threatened with breach of contract or violating the Computer Fraud and Abuse Act, which has been interpreted in the past as prohibiting violations of a site’s TOS.
“Facebook shapes public discourse in ways that are not fully understood by the public or even by Facebook itself. Journalists and researchers play a crucial role in illuminating Facebook’s influence on public discourse,” wrote the Knight First Amendment Institute’s Jameel Jaffer, Alex Abdo, Ramya Krishnan, and Carrie DeCell in a letter sent to Facebook CEO Mark Zuckerberg on Monday. “Facebook’s terms of service severely limit their ability to do that work, however, by prohibiting them from using basic tools of digital investigation on Facebook’s platform.”
The lawyers at the Knight First Amendment Institute, which recently successfully sued President Donald Trump for violating people’s freedom of speech by blocking their tweets, attached a proposed amendment for Facebook to add to its terms of service. They asked for a response by the beginning of September.
Facebook seemed reticent to offer such a safe harbor when asked for comment.
“We appreciate the Knight Institute’s recommendations. Journalists and researchers play a critical role in helping people better understand companies and their products – as well as holding us accountable when we get things wrong,” said Campbell Brown, Facebook’s head of global news partnerships, in a statement sent by email. “We do have strict limits in place on how third parties can use people’s information, and we recognize that these sometimes get in the way of this work. We offer tools for journalists that protect people’s privacy, including CrowdTangle, which helps measure the performance of content on social media, and a new API we’re launching to specifically analyze political advertising on Facebook.”
The API refers to Facebook’s plans to give journalists and researchers automated access to an archive of the political ads run on Facebook. That, of course, is a small sliver of what disturbs people about the Facebook world, leaving a lot of other information officially out of journalists’ reach.