Installing system packages in Docker with minimal bloat

By Itamar Turner-Trauring

When you’re building a Docker image for your Python application, you will need to:

  1. Upgrade system packages in order to get the latest security updates and critical bug fixes.
  2. Sometimes, install additional system packages as dependencies for your Python libraries or application, for debugging, or to otherwise help build your image.

Unfortunately, the default options for system package installation with Debian, Ubuntu, CentOS, and RHEL can result in much bigger images than you actually need.

So let’s see how you can install those security updates and dependencies—and still keep your image relatively small.

Why you shouldn’t just install some packages

Let’s see what happens if we just naively do security updates and install one extra package. Here’s our Dockerfile:

FROM python:3.8-slim-buster # Download latest listing of available packages:
RUN apt-get -y update
# Upgrade already installed packages:
RUN apt-get -y upgrade
# Install a new package:
RUN apt-get -y install syslog-ng

We’ll build this image and check the size of the resulting image:

$ docker build -t python-with-syslog .
...
$ docker image ls --format "{{ .Size }}" python:3.8-slim-buster
193MB
$ docker image ls --format "{{ .Size }}" python-with-syslog
327MB

Just installing syslog-ng increased our image by 134MB—but why?

Installing less, and cleaning up

Installing packages adds unnecessary size by:

  1. Installing recommended packages that you may not actually need.
  2. Keeping around cached copies of the package index and downloaded packages, which you don’t need once the installation is done.

To prevent these problems you need to install only the packages you really need, and to clean up unnecessary files once installation is done.

Because Docker images are structured as a series of additive layers, cleanup needs to happen in the same RUN command that installed the packages. Otherwise, the deleted files will be gone in the latest layer, but not from the previous layer, much like deleting a file in your latest Git commit doesn’t delete it from previous commits.

Let’s see how we do that for the two packaging variants we’re considering here, Debian/Ubuntu and CentOS/RHEL.

Debian, Ubuntu, and the Debian-based Python base image

The debian, ubuntu, and default python official base images all use the apt-get tool to install system packages. So the following will apply to all three.

Unlike before, when we had different RUN commands for each step, we’re going to have a single RUN command that runs a shell script called install-packages.sh:

FROM python:3.8-slim-buster COPY install-packages.sh .
RUN ./install-packages.sh

Because it’s a single RUN, deleting files inside that script will ensure they never make it into any layer of the image, so they won’t waste any space. Here’s what the script looks like:

#!/bin/bash # Bash "strict mode", to help catch problems and bugs in the shell
# script. Every bash script you write should include this. See
# http://redsymbol.net/articles/unofficial-bash-strict-mode/ for
# details.
set -euo pipefail # Tell apt-get we're never going to be able to give manual
# feedback:
export DEBIAN_FRONTEND=noninteractive # Update the package listing, so we know what package exist:
apt-get update # Install security updates:
apt-get -y upgrade # Install a new package, without unnecessary recommended packages:
apt-get -y install --no-install-recommends syslog-ng # Delete cached files we don't need anymore:
apt-get clean
rm -rf /var/lib/apt/lists/*

With these changes, the resulting image is much smaller:

$ docker build -t python-with-syslog-2 .
...
$ docker image ls --format "{{ .Size }}" python-with-syslog-2
238MB

Instead of adding 134MB as it did before, installing the package only took 45MB.

Red Hat Enterprise Linux and CentOS 8

With RHEL and CentOS we want to follow a similar procedure: install only the packages we specifically need, and clean up after ourselves.

Here’s our Dockerfile:

FROM centos:8 COPY install-packages.sh .
RUN ./install-packages.sh

And the corresponding install-packages.sh:

#!/bin/bash # Bash "strict mode", to help catch problems and bugs in the shell
# script. Every bash script you write should include this. See
# http://redsymbol.net/articles/unofficial-bash-strict-mode/ for
# details.
set -euo pipefail # Install security updates, bug fixes and enhancements only:
dnf -y upgrade-minimal # Install a new package, without unnecessary recommended packages:
dnf -y install --setopt=install_weak_deps=False python3 # Delete cached files we don't need anymore:
dnf clean all

Even smaller images

Installing only necessary packages and cleaning up after the installer are good starting points, but you can get even smaller images. In particular, if you need to install a compiler, you can use multi-stage builds to ensure the compiler toolchain doesn’t end up in your final image.

And if you don’t want to implement these techniques yourself, they are all included in my Production-Ready Python Containers template.

Build production-ready Docker images—fast!

You want fast builds, small and secure images, operational correctness. And doing it all yourself will take you a week or more of effort.Want to ship with confidence—in just hours?

Learn the faster way to build production-ready Python containers.

There’s not always time to learn new tools and technologies at work—but you still need to keep your skills sharp. And with so many tools and technologies to learn, you’re not even sure where to start.

Learn relevant, practical tools and techniques, quickly and efficiently, by signing up for my newsletter.

You’ll join over 1000 Python developers and data scientists getting weekly emails about software engineering best practices, from Docker packaging, to faster code, to better testing.