What is the space overhead of Base64 encoding?

By Daniel Lemire

Many Internet formats from email (MIME) to the Web (HTML/CSS/JavaScript) are text-only. If you send an image or executable file by email, it often first gets encoded using base64. The trick behind base64 encoding is that we use 64 different ASCII characters including all letters, upper and lower case, and all numbers.

Not all non-textual documents are shared online using base64 encoding. However, it is quite common. Load up google.com or bing.com and look at the HTML source code: you will base64-encoded images. On my blog, I frequently embed figures using base64: it is convenient for me to have the blog post content be one blob of data.

Base64 is apparently wasteful because we use just 64 different values per byte, whereas a byte can represent 256 different characters. That is, we use bytes (which are 8-bit words) as 6-bit words. There is a waste of 2 bits for each 8 bits of transmission data. To send three bytes of information (3 times 8 is 24 bits), you need to use four bytes (4 times 6 is again 24 bits). Thus the base64 version of a file is 4/3 larger than it might be. So we waste 33% of the bits!

That sounds bad. How can engineers tolerate such wasteful formats?

It is common for web servers to provide the content in compressed form. Compression partially offset the wasteful nature of base64.

To assess the effect of base64 encoding, I picked a set of images used in a recent research paper. There are different compression formats, but an old format is gzip. I encode the images using base64 and then I compress them with gzip. I report the number of bytes. I make the files are available.

File nameSizeBase64 sizeBase64 gzip size
bing.png135518321444
googlelogo.png235731862477
lena_color_512.jpg105764142876108531
mandril_color.jpg247222333970253868
peppers_color.jpg9478128079798

As you can see, the gzip sizes are within 5% of the original sizes. And for larger files, the difference is closer to 2.5%.

Thus you can safely use base64 on the Web without too much fear.

In some instances, base64 encoding might even improve performance, because it avoids the need for distinct server requests. In other instances, base64 can make things worse, since it tends to defeat browser and server caching. Privacy-wise, base64 encoding can have benefits since it hides the content you access in larger encrypted bundles.

Further reading. Faster Base64 Encoding and Decoding using AVX2 Instructions, ACM Transactions on the Web 12 (3), 2018. See also Collaborative Compression by Richard Startin.