U2F/FIDO tokens (a.k.a. “Security Keys”) are a solid contender for doing something about the effectiveness of phishing and so I believe they're pretty important. I've written a fairly lengthy introduction to them previously and, as mentioned there, one concerning aspect of their design is that they permit attestation: when registering a key it's possible for a site to learn a cryptographically authenticated make, model, and batch. As a browser vendor who has dealt with User-Agent sniffing, and as a large-site operator, who has dealt with certificate pervasiveness issues, that's quite concerning for public sites.
It's already the case that one significant financial site has enforced a single-vendor policy using attestation (i.e. you can only register a token made by that vendor). That does not feel very congruent with the web, where any implementation that follows the standards is supposed to be a first-class citizen. (Sure, we may have undermined that with staggering levels of complexity, but that doesn't discredit the worth of the goal itself.)
Even in cases where a site's intended policy is more reasonable (say, they want to permit all tokens with some baseline competence), there are strong grounds for suspecting that things won't turn out well. Firstly, the policies of any two sites may not completely align, leading to a crappy user experience where a user needs multiple tokens to cover all the sites that they use, and also has to remember which works where. Secondly, sites have historically not been so hot about staying up-to-date. New token vendors may find themselves excluded from the market because it's not feasible to get every site to update their attestation whitelists. That feels similar to past issues with User-Agent headers but the solution there was to spoof other browsers. Since attestation involves a cryptographic signature, that answer doesn't work here.
So the strong recommendation for public sites is not to request attestation and not to worry about it. The user, after all, has control of the browser once logged in, so it's not terribly clear what threats it would address.
However, if we assume that certain classes of sites probably are going to use attestation, then users have a collective interest in those sites enforcing the same, transparent standard, and in them keeping their attestation metadata current. But without any impetus towards those ends, that's not going to happen. Which begs the question: can browsers do something about that?
Ultimately, in such a world, sites only operate on a single bit of information about any registration: was this public-key generated in a certified device or not? The FIDO Alliance wants to run the certification process, so then the problem reduces down to providing that bit to the site. Maybe they would simply trust the browser to send it: the browser could keep a current copy of the attestation metadata and tell the site whether the device is certified or not. I don't present that as a straw-man: if the site's aim is just to ensure that the vast majority of users aren't using some backdoored token that came out of a box of breakfast cereal then it might work, and it's certainly simple for the site.
But that would be a short blog post, and I suspect that trusting the browser probably wouldn't fly in some cases.
So what we're looking for is something like a group signature scheme, but we can't change existing tokens. So we need to retrospectively impose a group signature on top of signers that are using vanilla P-256 ECDSA.
It is a surprising but true result in cryptography that it's possible to create a convincing proof of any statement in NP that discloses nothing except the truth of the statement. As an example of such a statement, we might consider “I know a valid signature of message x from one of the public keys in this set”. That's a pretty dense couple of sentences but rather than write an introduction to zero-knowledge proofs here, I'm going to refer you to Matthew Green's posts. He does a better job than I would.
I obviously didn't pick that example at random. If there was a well-known set of acceptable public keys (say, as approved by the FIDO Alliance) then a browser could produce a zero-knowledge proof that it knew a valid attestation signature from one of those keys, without disclosing anything else, notably without disclosing which public key was used. That could serve as an “attestation valid” bit, as hypothesised above, that doesn't require trusting the browser.
As a concrete instantiation of zero-knowledge proofs for this task, I'll be using Bulletproofs [BBBPWM17]. (See zkp.science for a good collection of many different ZK systems. Also, dalek-cryptography have excellent notes on Bulletproofs; Cathie Yun and Henry de Valence from that group were kind enough to help me with a question about Bulletproofs too.)
The computational model for Bulletproofs is an arithmetic circuit: an acyclic graph where public and secret inputs enter and each node either adds or multiplies all its inputs. Augmenting that are linear constraints on the nodes of the circuit. In the tool that I wrote for generating these circuits, this is represented as a series of equations where the only operations are multiplication, addition, and subtraction. Here are some primitives that hopefully convince you that non-trivial functions can be built from this:
- IsBit(x): x² - x = 0
- NOT(x): 1 - x
- AND(x, y): x × y
- OR(x, y): x + y - (x × y)
- XOR(x, y): x + y - 2(x × y)
Using single bit values in an arithmetic circuit certainly works, but it's inefficient. Getting past single-bit values, the arithmetic circuits in Bulletproofs don't work in ℤ (i.e. arbitrary-length integers), rather they work over a finite field. Bulletproofs are built on top of an elliptic curve and the finite field of the arithmetic circuit is the scalar field of that curve.
When dealing with elliptic curves (as used in cryptography) there are two finite fields in play: the x and y coordinates of the points on the curve are in the coordinate field of the curve. Multiples of the base point (B) then generate a prime number (n) of points in the group before cycling back to the base point. So xB + yB = (x + y mod n)B — i.e. you can reduce the multiple mod n before multiplying because it'll give the same result. Since n is prime, reduction mod n gives a field, the scalar field.
(I'm omitting powers of primes, cofactors, and some other complications in the above, but it'll serve.)
So Bulletproofs work in the scalar field of whatever elliptic curve they're implemented with, but we want to build P-256 ECDSA verification inside of a Bulletproof, and that involves lots of operations in P-256's coordinate field. So, ideally, the Bulletproofs need to work on a curve whose scalar field is equal to P-256's coordinate field. Usually when generating a curve, one picks the coordinate field to be computationally convenient, iterates other parameters until the curve meets standard security properties, and the scalar field is whatever it ends up as. However, after some quality time with “Constructing elliptic curves of prime order” (Broker & Stevenhagen) and Sage, we find that y² = x³ - 3x + B over GF(PP) where:
- B= 0x671f37e49d38ff3b66fac0bdbcc1c1d8b9f884cf77f0d0e90271026e6ef4b9a1
- PP= 0xffffffff000000010000000000000000aaa0c132719468089442c088a05f455d
… gives a curve with the correct number of points, and which seems plausibly secure based on the SafeCurves criteria. (A more exhaustive check would be needed before using it for real, but it'll do for a holiday exploration.) Given its relationship to P-256, I called it “PP-256” in the code.
Reviewing the ECDSA verification algorithm, the public keys and message hash are obviously public inputs. The r and s values that make up the signature cannot be both be public because then the verifier could just try each public key and find which one generated the signature. However, one of r and s can be public. From the generation algorithm, r is the x-coordinate of a random point and s is blinded by the inverse of the nonce. So on their own, neither r nor s disclose any information and so can just be given to the verifier—moving work outside of the expensive zero-knowledge proof. (I'm not worrying about tokens trying to use a covert channel here but, if you do worry about that, see True2F.)
If we disclose s to the verifier directly then what's left inside the zero-knowledge proof is 1) selecting the public key; 2) checking that the secret r is in range; 3) u₂ = r/s mod n; 4) scalar-multiplication of the public key by u₂; 5) adding in the (now) public multiple of the base point; and 6) showing that the x-coordinate of resulting point equals the original r, mod n.
The public-key is a 4-tooth comb, which is a precomputed form that speeds up scalar multiplications. It consists of 30 values. The main measure that we want to minimise in the arithmetic circuit is the number of multiplications where both inputs are secret. When selecting from t possible public keys the prover supplies a secret t-bit vector where only one of the bits is set. The proof shows that each value is, indeed, either zero or one using IsBit (from above, at a cost of one multiply per bit), and that exactly one bit is set by requiring that the sum of the values equals one. Each of the 30t public-key values is multiplied by one of the bits and summed to select exactly one key.
Rather than checking that the secret r is within [0, n-1], which would cost 512 multiplies, we just check that it's not equal to zero mod n. That's the important condition here since an out of range r is otherwise just an encoding error. Showing that a number is not zero mod n just involves showing that it's not equal to zero or n, as 2n is outside of the arithmetic circuit field. Proving a ≠ b is easy: the prover just provides an inverse for a - b (since zero doesn't have an inverse) and the proof shows that (a - b) × (a - b)⁻¹ = 1.
Calculating r/s mod n is the most complex part of the whole proof! Since the arithmetic circuit is working mod P-256's p, working mod n (which is the order of P-256—slightly less than p) is awkward. The prover gives bit-wise breakdown of r; the proof does the multiplication as three words of 86, 86, and 84 bits; the prover supplies the values for the carry-chain (since bit-shifts aren't a native operation in the arithmetic circuit); the prover then gives the result in the form a×n + b, where b is a 256-bit number; and the proof does another multiplication and carry-chain to check that the results are equal. All for a total cost of 2152 multiplication nodes!
After that, the elliptic curve operation itself is pretty easy. Using the formulae from “Complete addition formulas for prime order elliptic curves” (Renes, Costello, and Batina) it takes 5365 multiplication nodes to do a 4-tooth comb scalar-mult with a secret scalar and a secret point. Then a final 17 multiplication nodes add in the public base-point multiple, supply the inverse to convert to affine form, and check that the resulting x-coordinate matches the original r value. The circuit does not reduce the x-coordinate mod n in order to save work: for P-256, that means that around one in 2¹²⁸ signatures may be incorrectly rejected, but that's below the noise floor of arithmetic errors in CPUs. Perhaps if this were to be used in the real world, that would be worth doing correctly, but I go back to work tomorrow so I'm out of time.
In total, the full circuit contains 7534 multiplication nodes, 2154 secret inputs, and 17 236 constraints.
My tool for generating the matrices that Bulletproofs operate on outputs 136KB of LZMA-compressed data for the circuit described above. In some contexts, that amount of binary size would be problematic, but it's not infeasible. There is also quite a lot of redundancy: the data includes instructions for propagating the secret inputs through the arithmetic circuit, but it also includes matrices from which that information could be derived.
The implementation is based on BoringSSL's generic-curve code. It doesn't even use Shamir's trick for multi-scalar multiplication of curve points, it doesn't use Montgomery form in a lot of places, and it doesn't use any of the optimisations described in the Bulletproofs paper. In short, the following timings are extremely pessimistic and should not be taken as any evidence about the efficiency of Bulletproofs. But, on a 4GHz Skylake, proving takes 18 seconds and verification takes 13 seconds. That's not really practical, but there is a lot of room for optimisation and for multiple cores to be used concurrently.
The proof is 70 450 bytes, dominated by the 2154 secret-input commitments. That's not very large by the standards of today's web pages.
Intermediates and FIDO2
One important limitation of the above is that it only handles one level of signatures. U2F allows an intermediate certificate to be provided so that only less-frequently-updated roots need to be known a priori. With support for only a single level of signatures, manufacturers would have to publish their intermediates too. (But we already require that for the WebPKI.)
Another issue is that it doesn't work with the updated FIDO2 standard. While only a tiny fraction of Security Keys are FIDO2-based so far, that's likely to increase. With FIDO2, the model of the device is also included in the signed message, so the zero-knowledge proof would also have to show that a SHA-256 preimage has a certain structure. While Bulletproofs are quite efficient for implementing elliptic curves, a binary-based algorithm like SHA-256 is quite expensive: the Bulletproofs paper notes a SHA-256 circuit using 25 400 multiplications. There may be a good solution in combining different zero-knowledge systems based on “Efficient Zero-Knowledge Proof of Algebraic and Non-Algebraic Statements with Applications to Privacy Preserving Credentials” (Chase, Ganesh, Mohassel), but that'll have to be future work.
Happy new year.