Docker builds can be slow, and waiting for a build to finish is probably not how you want to spend your time. So you want faster builds—and caching is a great way to get there.
If the files you’re relying on haven’t changed, the Docker build can reuse previously cached layers for this particular build. And so if you separate out installation of dependencies from installation of your code in your Dockerfile you’ll usually get faster builds: if just your code changes, you won’t have to wait for all dependencies to be installed when you rebuild the Docker image.
In this article I’ll cover:
- An example of how to get slow builds.
- Faster builds by installing requirements first, from a
- Managing your
pipenvin your Docker build.
poetryin your Docker build.
Note: Outside the specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Want a best-practices Dockerfile and build system? Check out my Production-Ready Python Containers product.
How to get slow builds
A common way of listing your dependencies is to put them in
from setuptools import setup from setuptools import setup from Cython.Build import cythonize setup(name='exampleapp', packages=["exampleapp"], install_requires=["flask", "dateutil"])
Dockerfile might then look like this:
FROM python:3.7 COPY . /tmp/myapp RUN pip install /tmp/myapp CMD flask run exampleapp:app
The problem with this setup is that every time you change the code, that invalidates the
COPY . /tmp/myapp layer in the Docker cache, as well as all subsequent lines in the
And so every time you rebuild the image, you will need to reinstall the dependencies and your code.
Faster builds with requirements.txt
If you separate out your dependencies into a separate file, traditionally named
requirements.txt, you can copy in only that file, and install it earlier.
That way dependency installation can be cached, and packages will need to be reinstalled only if
requirements.txt would look like this:
Dockerfile like this:
FROM python:3.7 COPY requirements.txt /tmp RUN pip install -r requirements.txt COPY . /tmp/myapp RUN pip install /tmp/myapp CMD flask run exampleapp:app
Notice that initially we only copy
requirements.txt, so that changes to the code won’t invalidate the caching at this point.
This will give you faster builds if you have made sure to pull previous builds.
A new problem: reproducible builds
The scheme above has a problem: it always installs the latest version of the dependencies. So if you build the image on a computer that doesn’t have the cache populated, you might get a different set of installed dependencies than what would get built if you did have the Docker cache populated.
In short, your builds aren’t reproducible.
And that can lead to a variety of problems, e.g. random breakage when packages get upgraded without your knowledge, or hard-to-reproduce differences in behavior (“but it worked on my computer!”).
To solve this you want to keep two versions of your dependencies:
- The logical dependencies of your application, i.e. the packages you directly import. In our example,
- The pinned transitive dependencies. That is, all of
flask’s dependencies (and their dependencies, and so on), pinned to a particular version.
The logical dependencies can be used to regenerate the pinned dependencies on the demand. The pinned dependencies are what you use to get reproducible builds.
Reproducible builds using pip-tools
pip-tools is one easy way to do this.
You can store the logical dependencies in
Then run (with matching Python version and ideally operating system):
$ pip-compile requirements.in > requirements.txt
And the resulting
requirements.txt looks like this:
argparse==1.4.0 # via dateutils click==7.0 # via flask dateutils==0.6.6 flask==1.0.3 itsdangerous==1.1.0 # via flask jinja2==2.10.1 # via flask markupsafe==1.1.1 # via jinja2 python-dateutil==2.8.0 # via dateutils pytz==2019.1 # via dateutils six==1.12.0 # via python-dateutil werkzeug==0.15.4 # via flask
You check in
requirements.txt into version control.
Dockerfile requires no changes from the version we showed above.
Fast reproducible Docker builds with pipenv
pipenv is another tool that allows you to maintain logical dependencies (in a
Pipfile) and pinned dependencies (in a
It also does a whole lot more, e.g. virtualenv management.
Much of what it does isn’t relevant to building Docker images, though, so the easy way to use it in your Docker build is to export a
You can do this outside your Docker build, and just commit the resulting file to version control and use the
$ pipenv lock --requirements > requirements.txt
The plus is that your
Dockerfile doesn’t need to know anything about
pipenv. This does require you to remember to regenerate
requirements.txt every time you update
Alternatively, you can do the export in the build itself:
FROM python:3.7 RUN pip install pipenv COPY Pipfile* /tmp RUN cd /tmp && pipenv lock --requirements > requirements.txt RUN pip install -r /tmp/requirements.txt COPY . /tmp/myapp RUN pip install /tmp/myapp CMD flask run exampleapp:app
Note that a better setup, omitted for clarity, would have you install
pipenv in such a way that its dependencies don’t impact your code, e.g. by using a virtualenv for your code. If you want a best-practices Dockerfile and build system, check out my Production-Ready Python Containers product.
Fast reproducible Docker builds with poetry
poetry is another tool that lets you manage logical and pinned dependencies.
Unfortunately I have failed to figure out how to install dependencies separately from the code in the current released version.
So you may be stuck with installing both code and dependencies at the same time if you’re using
Version 1.0 (as of June 2019 in alpha status) will have a
poetry export command that lets you export a
So one potential option is to install it with
pip install --pre poetry and just use the pre-release for creating a
To get fast, reproducible builds for your application:
- Separate dependencies from your
- Separate logical and pinned dependencies (using
pip-toolsis Hynek Schlawack’s recommendation, and I concur).
- Install dependencies separately and earlier in your
Dockerfileto ensure faster builds.