Privacy-preserving machine learning offers many benefits and interesting applications: being able to train and predict on data while it remains in encrypted form unlocks the utility of data that were previously inaccessible due to privacy concerns. But to make this happen several technical fields must come together, including cryptography, machine learning, distributed systems, and high-performance computing.
The tf-encrypted open source project aims at bringing researchers and practitioners together in a familiar framework in order to accelerate exploration and adaptation. By building directly on TensorFlow it provides a performant framework with a high-level interface that abstracts away most of the underlying complexity, in turn allowing users to easily get started without first becoming cross-disciplinary experts.
In this blog post, we apply the library to a traditional machine learning example, providing a good starting point for anyone wishing to get into this rapidly growing field. We will see that despite using state-of-the-art cryptography, we here only need a very basic familiarity with machine learning and TensorFlow.
Concretely, we consider the classic MNIST digit classification task. To keep things simple we use a small neural network and train it in the traditional way in TensorFlow using an unencrypted training set. However, for making predictions with the trained model we turn to tf-encrypted, and show how two servers can perform predictions for a client without learning anything about its input. While MNIST is a somewhat basic yet standard benchmark in the literature, it’s also interesting in that it has extensions to many different use cases in private machine learning, including medical image analysis.
We start by looking at how our task can be solved in standard TensorFlow and then go through the changes needed to make the predictions private via tf-encrypted. Since the interface of the latter is meant to simulate the simple and concise expression of common machine learning operations that TensorFlow is well-known for, this requires only a small change that highlights what one must inherently think about when moving to the private setting.
Following standard practice, the following script shows our two-layer feedforward network with ReLU activations (more details in our preprint).
Note that the concrete implementation of
provide_input (line 4–5) have been left out for the sake of readability. These two methods simply load their respective values from NumPy arrays stored on disk, and return them as tensor objects.
We next turn to making the predictions private, where for the notion of privacy and encryption to even make sense we first need to recast our setting to consider more than the single party implicit in the script above. As seen below, expressing our intentions about who should get to see which values is the biggest difference between the two scripts.
We can naturally identify two of the parties: the prediction client who knows its own input and a model owner who knows the weights. Moreover, for the secure computation protocol chosen here we also need two servers that will be doing the actual computation on encrypted values; this is often desirable in applications where the clients may be mobile devices that have significant restraints on computational power and networking bandwidth.
In summary, our data flow and privacy assumptions are as illustrated in the diagram above. Here a model owner first gives encryptions of the model weights to the two servers (known as a private input), the prediction client then gives encryptions of its input to the two servers (another private input), who can execute the model and send back encryptions of the prediction result to the client, who can finally decrypt; at no point can the two servers decrypt any values. Below we see our script expressing these privacy assumptions.
Note that most of the code remains essentially identical to the traditional TensorFlow code, using
tfe instead of
provide_weightsmethod for loading model weights (line 16) is now wrapped in a call to
tfe.define_private_inputin order to specify they should be owned and restricted to the model owner; by wrapping the method call, tf-encrypted will encrypt them before sharing with other parties in the computation.
- As with the weights, the prediction input is now also only accessible to the prediction client (line 17), who is also the only receiver of the output (line 26). Here the
tf.Printstatement has been moved into
receive_outputas this is now the only point where the result is known unencrypted.
- We also tie the name of parties to their network hosts (lines 5–8). Although omitted here, this information also needs to be available on these hosts, as typically shared via a simple configuration file.
It’s user-friendly! Very little boilerplate, very similar to traditional TensorFlow.
It’s abstract and modular! It integrates secure computation tightly with machine learning code, hiding advanced cryptographic operations underneath normal tensor operations.
It’s extensible! New protocols and techniques can be added under the hood, and the high-level API won’t change. Similarly, new machine learning layers can be added and defined on top of each underlying protocol as needed, just like in normal TensorFlow.
It’s performant! All of this is computed efficiently since it gets compiled down to ordinary TensorFlow graphs, and can hence take advantage of the optimized primitives for distributed computation that the TensorFlow backend provides.
These properties also make it easy to benchmark a diverse set of combinations of machine learning models and secure computation protocols. This allows for more fair comparisons, more confident experimental results, and a more rigorous empirical science, all while lowering the barrier to entry to private machine learning.
Finally, by operating directly in TensorFlow we also benefit from its ecosystem and can take advantage of existing tools such as TensorBoard. For instance, one can profile which operations are most expensive and where additional optimizations should be applied, and one can inspect where values reside and ensure correctness and security during implementation of the cryptographic protocols as shown below.
Here, we visualize the various operations that make up a secure operation on two private values. Each of the nodes in the underlying computation graph are shaded according to which machine aided that node’s execution, and it comes with handy information about data flow and execution time. This gives the user a completely transparent yet effective way of auditing secure computations, while simultaneously allowing for program debugging.
Our vision at Dropout Labs is to allow artificial intelligence and data privacy to work together. We’re getting behind the development of tf-encrypted to provide researchers and practitioners with the open-source tools they need to quickly experiment with secure protocols and primitives for private machine learning.
We hope that this aids and inspires the next generation of researchers to implement their own novel protocols and techniques for secure computation in a fraction of the time, so that machine learning engineers can start to apply these techniques for their own use cases in a framework they’re already intimately familiar with.