Certain studies require sensitive data sets: the relationship between nutritious school lunch and student health, the effectiveness of salary equity initiatives, and so on. Valuable insights require navigating a minefield of private, personal information. Now, after years of work, cryptographers and data scientists at Google have come up with a technique to enable this "multiparty computation" without exposing information to anyone who didn't already have it.
Today Google will release an open source cryptographic tool known as Private Join and Compute. It facilitates the process of joining numeric columns from different data sets to calculate a sum, count, or average on data that is encrypted and unreadable during its entire mathematical journey. Only the results of the computation can be decrypted and viewed by all parties—meaning that you only get the results, not the data you didn't already own. The cryptographic principles underlying the tool date back to the 1970s and '90s, but Google has repurposed and updated them to work with today's more powerful and flexible processors.
"The net result is that we can perform this computation without exposing any individual data and only getting the aggregate result," says Amanda Walker, director of privacy tools and infrastructure engineering at Google. "The naïve way to do this would be to take two sensitive data sets, dump them into a single database, and do the join and the sum, but then you’ve got everything together and at risk of a data breach."
Lily Hay Newman covers information security, digital privacy, and hacking for WIRED.
Take the school lunch example. The school has information on all of its students and what food it has served when. But it would need data from health care providers over time to track whether menu changes are potentially having a positive impact on students' health. Private Join and Compute would allow these parties, which all hold very sensitive data, to essentially compare notes without divulging sensitive information to each other.
Private Join and Compute uses a 1970s methodology known as "commutative encryption" to allow data in the data sets to be encrypted with multiple keys, without it mattering which order the keys are used in. This is helpful for multiparty computation, where you need to apply and later peel away multiple layers of encryption without affecting the computations performed on the encrypted data. Crucially, Private Join and Compute also uses methods first developed in the ’90s that enable a system to combine two encrypted data sets, determine what they have in common, and then perform mathematical computations directly on this encrypted, unreadable data through a technique called homomorphic encryption.
"We said 'OK, the early systems were very limited and only did a few operations," Walker says. "Are those operations we can use? And the answer turned out to be yes."
Google has already been distributing a technical paper that describes Private Join and Compute to academic and industry cryptographers. And since the company is open sourcing the tool, additional opportunities will come to vet the method's security and privacy. Tal Malkin, a cryptographer at Columbia University who had an early look at the paper, says that the new tool represents an important step—particularly because it comes from Google and will be open source. This may spur its adoption among businesses that are flush with user data and looking to manage it more privately.
"Secure computation has been a thriving area of research in cryptography since the 1980s, but until recently was considered to be too theoretical for practice," Malkin says. "I think this project is an exciting step toward opening this important privacy technology for general use."
The WIRED Guide to Personal Data
Google emphasizes the technique's potential public policy and social advocacy uses, and Joseph Lorenzo Hall, chief technologist for the Center for Democracy and Technology, says these potential applications are very exciting. But like Malkin, Hall also notes that businesses—including Google itself—will likely lean on Private Join and Compute in an attempt to study user data without overstepping privacy bounds. For example, Google's Walker says that the company has already launched a beta test in the US that uses the tool for advertising measurements.
"This is sort of the holy grail of a lot of things," CDT's Hall says. "Google is using math to allow two parties who don’t trust each other, but who want some kind of aggregate statistic that's only available by combining their data, to do that without anyone having any information about the underlying individuals involved."
Though Private Join and Compute makes private calculations possible that were never practical before, it's still computationally intensive, and might not be feasible for use in all situations. And CDT's Hall also points out that it's always possible for the tool to be used to find the answers to questions that society shouldn't know, or that are invasive in some way. "As the cryptographer Phil Rogaway puts it, privacy-preserving surveillance is still surveillance," he says.
But the cryptographic advances will also potentially enable a lot of public good. "There was literally nothing you could do to privately answer these questions before," Hall adds. "It's amazing, there are so many ways we could use this."