It’s a good question, and before you know the answer, Docker images can seem pretty mysterious. Not only do I want to give you the answer, but I want to show you how I got there.
From a Dockerfile to an Image
Let’s start at the beginning. Hopefully you’re all familiar with a Dockerfile - the instructions on how Docker will build an image for you. Here’s a simple example.
Each of these lines are instructions to Docker on how to build an image. It will use ubuntu:15.04 as the base and then copy in a python script. The CMD instruction is a directive on what to do when you run the container (turn the image into a running process), and therefore not relevant at this stage.
Let’s run docker build . and check the output.
Looking at the last two lines, we have succesfully built a Docker image which we can refer to by the identifier 174b1e992617 (This value is a SHA256 hash digest of the image contents).
We have our final image, but what are the IDs from the individual steps? d1b55fd07600 and 44ab3f1d4cd6? Are they images aswell? Actually, yes, they are. Imagine if we got rid of Step 2 (COPY app.py /app) from our Dockerfile, Docker would still succesfully build that as an image (ignoring the fact that the CMD would fail because app.py is missing). So at each step in the image building process, we have an image.
Which tells us that images can be built ontop of each other! That makes sense when you consider the FROM directive in the Dockerfile is just specifying which image to build ontop of.
The structure of an image must be organised in such a way to allow this, but how? We’re going to pull apart a docker image to find out.
Exporting an image, and unpacking it
For ease of use, images can be exported to a single file, making it simple for us to take a look inside.
docker save my_test_image > my_test_image
And the exported file is….
A tarball! A compressed file or directory. Let’s unpack it.
We will start our investigation at manifest.json
The manifest file is a piece of metadata which describes exactly what’s inside this image. We can see the the image has a tag my_test_image, and it has something called Layers and another called Config.
The first 12 characters of the config JSON file is the same as the image id we saw from the docker build, coincidence - I think not!
It’s quite a big JSON file but looking through you can see that there’s lots of different metadata in here. In particular, there is metadata about how to turn this image into a running container - the command to run and environment variables to add.
Images are like Onions
They both have layers. But what’s a layer? I’m going to pick on cac0b96b79417d5163fbd402369f74e3fe4ff8223b655e0b603a8b570bcc76eb because that was the first one in the Layers list.
Another tarfile, let’s unpack it and take a look.
And this is the big secret of Docker images, it’s made up of different views of the file system! There’s quite a lot in this layer, userland binaries in /bin, shared libraries in /usr/lib, almost everything you would see looking around in a standard Ubuntu filesystem. So what does each layer contain exactly? Well it would help to know which layers came from the base image, and which were added by us.
Using the same process we did earlier but on ubuntu:15.04 I can see that layers
all belong to the ubuntu base image, the FROM ubuntu:15.04 command. Knowing this, I predict that the top most layer of our my_test_image image, 6c91b695f2ed98362f511f2490c16dae0dcf8119bcfe2fe9af50305e2173f373, should be from the command COPY app.py /app/.
It is, and all that’s inside is the change that we made to the filesystem, which was just adding the app.py file.
To see it all come together visually, our image looks like this:
Doing that manually was quite a bit of effort, but it’s rewarding to do it at least once. If you ever want to analyse your images in the future, you can use the Open Source tool dive!
How does this get turned into a running container?
Now that we understand what a Docker image is, how does Docker turn this into a running container?
Each container has it’s own view of the filesystem, Docker will take all of the layers inside the image and lay them ontop of each other to present one view of the filesystem. This technique is called Union Mounting, Docker supports several Union Mount Filesystems on Linux, the main ones being OverlayFS and AUFS.
But that’s not all, Containers are meant to be ephemeral, changes to the filesystem while the container is running should not be saved once the container stops. One way to do this could be to copy the entire image somewhere else, that way the changes will not affect the original files. This is not very efficient, the alternative (and what Docker does) is to add a thin Read/Write layer to the very top of the filesystem in the container where the changes will be made. If you need to make a change to a file in a layer below, that file will need to be copied up to the top layer where changes are made. This is called Copy-On-Write. When the container stops running, the top most file system layer is discarded.
The full process of starting a container is out of scope for this article. After the filesystem, the image is not used for much else other than its metadata configuring some of the next steps. For completeness, to make a running container we need to use Namespaces to control what the process can see (Filesystem, Processes, Network, Users, etc); Use cgroups to control what resources a process can use (Memory, CPU, Network, etc); and Security Features to control what a process can do (Capabilities, AppArmor, SELinux, Seccomp).