Tuesday, 27 November 2018
Designing Headers for HTTP Compression
One of the concerns that often comes up when someone creates a new HTTP header is how much “bloat” it will add on the network. This is especially relevant in requests, when a little bit of extra data can introduce a lot of latency when repeated on every request.
For example, if requests have an average of 1K of headers, and loading your page requires 30 requests, that means you need to upload 30K of data to get all of the responses started. TCP slow start means that on a new connection, it’s going to take many round trips to get those requests out – a significant amount of time, especially when the server is far away.
HTTP/2 added a form of header compression (called HPACK) to help address this concern. Unfortunately, there’s not a lot of information about it out there, so it’s easy to fall into the trap of thinking it works like other compression schemes we’re familiar with – for example, GZIP – and assuming that it’ll just work.
If you remember nothing else from this blog post, please take away that HPACK is not GZIP. It works in a very different way, and if you don’t understand how it works when you create a new header, you can make HTTP/2 much (much!) less efficient.
How HPACK Works
It’s reasonable to ask why GZIP wasn’t used; it offers better compression, as illustrated in our tests that helped guide HPACK’s design. Below we show the size of all the 140 request header blocks on a single connection in “raw” HTTP/1 form, as well as GZIP compressed and an early form of HTTP/2 compression.
Unfortunately, using fine-grained compression techniques like GZIP makes the protocol vulnerable to attacks like CRIME; an attacker who can observe the connection and also control parts of requests or responses can recover secrets (e.g., authentication tokens in
Cookies) even when TLS is used.
To address this, HPACK uses two techniques that were judged reasonably CRIME-resistant by security experts:
- A static Huffman coding, and
- A lookup table that works at the granularity of an entire header field.
The Huffman coding is static so that an attacker can’t influence compression efficiency. It uses as few as five bits (but often more) to encode common characters, so you can count on something like 20% gains in efficiency from it (depending on the exact text you’re encoding). See the code for details.
There are actually two lookup tables; a static table and a dynamic table.
The static table is defined in the specification; it doesn’t change. It’s there to “seed” a connection with some (hopefully) common values, so that they don’t have to be sent in full when the underlying TCP connection is still going through slow start.
The dynamic table, on the other hand, is actively managed by the sender; the server manages responses, and the client manages requests. When headers are sent, the sender tells the recipient whether to index that header name/value pair; if it does, future messages can refer back to that value by reference, instead of sending it again.
But, the header has to exactly match. For example, if you send this response header and tell the client to index it:
Cache-Control: max-age=60, stale-while-revalidate=60
Future responses on that connection can refer to it only if they’re exactly the same.
max-age is 61? Different header. It comes after
stale-while-revalidate? Different header. No
stale-while-revalidate? Different header. Use uppercase? Different header. Not a space after the comma? Different header.
Each end of the connection can tell the other side the maximum lookup table size that it wants to keep for the headers it receives; the default is 4K. So the client can tell the server not to use more than a certain amount of space for responses, and likewise the server can tell the client how much space it wants to use for requests.
Managing the size of the dynamic table is important; if it’s big, your compression efficiency will grow (consider how big headers can get!), but each end of the connection has to keep enough memory around to store the dynamic tables – both request and response – for each connection it has open.
Browsers really like good compression efficiency, so they generally negotiate (with a SETTING) to use a bigger dynamic table for responses. Servers, on the other hand, usually need to be very careful with memory, since they want to serve as many clients as possible. As a result, they generally don’t negotiate a bigger dynamic table for requests, and often don’t use a bigger response table even when the client lets them.
There’s lots more to HPACK, but those are the high points. There’s one more important detail I’ll mention; when you’re calculating the size of a header for purposes of enforcing the limits described above, you add the size of the header field name (without Huffman coding), the size of the value (also without coding), and add 32 bytes for overhead (specified here). We’ll come back to why that’s important in a minute.
Knowing how HPACK works, it’s easy to understand what header would compress best with it; it’s a header that, for the duration of a connection, never changes in any way. It can occur on some messages in one direction, all messages in that direction, or just a few messages, but when it occurs, it doesn’t change.
That’s because it will create just one entry in the dynamic table and continuously reference that entry; it only goes out “on the wire” once, when it’s sent the first time.
It’d be even better if it was small; that means that it won’t take up a lot of space in the dynamic table, so that other headers have a chance of being compressed too. This isn’t hyper-important; if it’s 20 or even 100 bytes, that’s fine, since it’s going to save a lot more on the wire.
If you think about it, this isn’t surprising; some of the most voluminous headers in requests are those that get repeated a number of times over the course of a connection; things like
Referer. It makes sense that HPACK was optimised for them.
On the other end of the scale, headers that have lots of different versions on the same connection mean lots of different entries in the dynamic table. How much space they’ll take up depends on a lot of different factors, but the absolute worst would be a header that has many different values that rotate through constantly; that would take up a lot of room in the dynamic table and “kick” a lot of other headers out.
Date header is a good example here; it changes once a second, so putting it into the dynamic table often doesn’t make sense (and some implementations don’t).
Making them big just magnifies the effect. At the extreme, a header with a 2K value takes up half the default dynamic table size. However, many implementations won’t even index a header that big; since it’d remove so many other headers, it’s often a better strategy to just send it out on the wire as a literal every single time.
So, when you’re designing a header, you want it to be as close to the ideal as you can make it. This might take your design to a surprising place. For example, if you have a header that has many different values that change, like this:
My-Header: foo=15, bar=true, baz=4, bat=false
foo is a number between 1 and 20,
bat are booleans, and
baz is a number between 1 and 10, that means that there are 20 x 2 x 10 * 2 = 800 (!) possible permutations of this header value.
Let’s say that the field name and value have an average of 44 characters; remembering the 32 byte overhead (this is why it was important), that’s 76 x 800 = 60,800 bytes of space in the dynamic table to keep all of the different permutations. Ouch!
On the other hand, if you did this:
My-Foo: 15 My-Bar: true My-Baz: 4 My-Bat: false
That’s 20 + 2 + 10 + 2 = 34 different headers (remember, we’re working at per-header granularity), with an average of 8 + 32 = 40 characters each, for something like 1,360 bytes of space in the dynamic table to hold all possible values.
Even if you reduce the size of the different values (perhaps using a binary encoding), the per-value overhead and the significantly larger number of permutations blows out the potential space used.
Of course, this is hugely dependant upon the access patterns in your application; seeing all of the possible permutations on a single connection is pathological. Right?
Except it’s not. One of the core concepts of HTTP is intermediation, and that’s what makes things like reverse proxies and CDNs possible. These devices will be mixing traffic from a lot of different clients back into one connection to the origin, with a much greater diversity of traffic.
That connection is likely to have bigger table sizes, but there are still limits. A header that blows out the table will be aggressively managed out of the compression context by a good implementation, and as a result will see poor compression efficiency.
That brings us to another point; allowing many different forms of the header (with things like variable whitespace, or different forms of quoting) will also create unnecessary variations and duplication in the compression context – again affecting your header’s efficiency.
Again, all of these headers will be seen as different by HPACK, and so will waste space in the dynamic table:
My-Header: "foo" My-Header: foo My-Header: 'foo' My-Header: Foo My-Header: foo My-Header: foo, My-Header: foo;bar
While such variation is unlikely if they’re all generated by the same implementation, when multiple connections are mixed onto one (as is the case for intermediaries), it’s very likely.
You can avoid this by specifying the generation of your header very carefully, recommending a canonical form. One emerging specification that could help here is Structured Headers; it documents serialisation algorithms for various data types in HTTP headers so that you don’t have to.
It’s worth mentioning that it’s not strictly necessary to split a header like this:
My-Header: foo=15, bar=true, baz=4, bat=false
You could serialise it like this:
My-Header: foo=15 My-Header: bar=true My-Header: baz=4 My-Header: bat=false
Because of HTTP’s header combination rules, these two forms are semantically equivalent, but the second form doesn’t create a combinatorial explosion in the dynamic table.
I don’t recommend relying upon this, though. It requires the sender to do this in every case – something that’s easy to forget, or be ignorant of. Also, an intermediary might “helpfully” combine them for you, leading back to the combinatorial explosion.
One final consideration that sometimes comes up is the header’s name itself. Sometimes people try to over-optimise this, believing that shaving off a few characters is going to make a huge difference in performance. More often, people choose needlessly long names for their headers.
The sweet spot is in the middle; your header name should be descriptive enough that people can recognise it (remember, it’s a shared space; act accordingly). Saving a few bytes probably doesn’t matter unless your header is sent a lot, and can’t get into the dynamic table.
For example, a name like
FooApp-Response-Description is probably needlessly verbose; we know that “foo” is an application, and if it’s in a response, we know it’s a response header. Something like
Foo-Desc would serve just as well.
On the other hand, grabbing
Description for your application isn’t a very friendly thing to do; it’s an extremely generic term, and you’re just one application that might use it.
So far, the biggest difference in QPACK is in how it deals with ordering; since QUIC doesn’t guarantee ordering between streams, a number of changes needed to be made to how the compression algorithm keeps state.
Other things have changed too; QPACK has a different static table to start connections with, and has separate references to header names and header values.
From the perspective a header designer, HPACK and QPACK should operate in about the same way. That said, designing your header to work optimally with the minutia of HPACK (for example, targetting specific characters in the Huffman coding) is likely to backfire, because the details are likely to change in the long term.
That’s especially true if Structured Headers meets one of its goals; enabling a new encoding of its data types on the wire, opening up the possibility of a new compression scheme at that granularity.
HPACK compression isn’t magical, but it is pretty effective for certain kinds of headers. To get the most out of it with your header, keep it brief (but still human-readable and extensible!) and more importantly, make sure that it doesn’t change unnecessarily. More headers are better than a changing header.
While you’re creating a new header, there are also lots of other things to keep in mind - check it out!
Thanks to Roberto Peon for reviewing this post.