One important part of running your container in production is locking it down, to reduce the chances of an attacker using it as a starting point to exploit your whole system. Containers are inherently less isolated than virtual machines, and so more effort is needed to secure them.
Doing this is actually pretty straightforward:
- Don’t run your container as
- Run your container with less capabilities.
Let’s see why and how.
Note: Outside the specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Want a best-practices Dockerfile and build system? Check out my Production-Ready Python Containers product.
Don’t run as root
There are two reasons to avoid running as
root, both security related.
First, it means your running process will have less privileges, which means if your process is somehow remotely compromised, the attacker will have a harder time escaping the container.
For example, a CVE in February 2019 that allowed escalation to
root on the host was explicitly preventable by “a low privileged user inside the container”.
Second, and this is a more subtle point, running as a non-
root user means you won’t try to take actions that require extra permissions.
And that means you can run your container with less “capabilities”, making it even more secure.
Let’s see what this means.
What exactly is a capability?
“Capabilities” in this context are a technical term: Linux capabilities give processes the ability to do some of the many privileged operations only
root can do by default.
CAP_CHOWN allows a process to “make arbitrary changes to file UIDs and GIDs”.
By default Docker grants a whole bunch of capabilities to a container, but not all them; running as
root in a container isn’t quite as powerful as normal
I’ve created a little container that runs the
getpcaps program that prints out a process’ capabilities:
FROM ubuntu:18.04 RUN apt-get update && apt-get install -y libcap2-bin inetutils-ping CMD ["/sbin/getpcaps", "1"]
And as you can see, a container run as
root has many capabilities:
$ docker run --rm getpcaps Capabilities for '1': = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip
Why you also need to drop capabilities
Now, you would think that running as a non-
root user would lose these capabilities, and that is the case… but there are caveats:
$ docker run --rm --user 1000 getpcaps Capabilities for '1': = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i
Notice that the non-
root user (with uid 1000) has the same list of capabilities, but with “+i” (inherit) at the end instead of “+eip” (effective, permitted, inherit).
That means the process doesn’t have access to these capabilities by default, but a child process can get them back by running an executable that can escalate permissions, e.g. via
ping is typically either
setuid (the process becomes root when run) or in more secure systems is set to give its process the
NET_CAP_RAW capability. In either case it means the subprocess has more capabilities than the parent process:
$ docker run --rm --user 1000 -it getpcaps /bin/bash I have no name!@9ce0e2e21c21:/$ ping 192.168.7.1 PING 192.168.7.1 (192.168.7.1): 56 data bytes 64 bytes from 192.168.7.1: icmp_seq=0 ttl=63 time=0.675 ms 64 bytes from 192.168.7.1: icmp_seq=1 ttl=63 time=25.509 ms
Now, imagine that
ping had a bug, and the parent process takes it over and injects arbitrary code—at that point the user will have regained all those lost capabilities.
So in addition to not running as
root, you also want to explicitly drop capabilities completely, so they can’t be “inherited” by launching more capable executables.
In some environments you won’t have this level of control, but if you’re using Docker directly, or using Kubernetes, you can explicitly add or drop capabilities.
For example, for many programs we can drop all capabilities:
$ docker run --rm --user 1000 -it --cap-drop ALL getpcaps /bin/bash I have no name!@aacfefe4cc3a:/$ ping 126.96.36.199 ping: Lacking privilege for raw socket.
ping binary is insufficient to get those extra capabilities.
The correct way to run as a non-
One way you can run your container as non-
root user is to use
su or some variant to change users.
The problem with that is that you start out running as
root, and then execute an operation (changing user IDs) that requires a
So you can’t drop all capabilities if you do that.
What you want is your container running as a non-root user from the start. You can do that in your runtime configuration, but then you have to remember to do that.
So an even better solution is adding a new user when building the image, and using the Dockerfile
USER command to change the user you run as.
You won’t be able to bind to ports <1024, but that’s a good thing—that’s another capability (
CAP_NET_BIND_SERVICE) you don’t need.
And since your container is pretty much always behind a proxy of some sort, that’s fine, the external proxy can listen on port 443 for you.
Here’s a simple Dockerfile demonstrating how this works:
FROM ubuntu:18.04 RUN useradd --create-home appuser WORKDIR /home/appuser USER appuser
To have a more secure container:
- Run as a non-
rootuser, using the Dockerfile’s
- Drop as many Linux capabilities as you can (ideally all of them) when you run your container.
To learn more about the subject, check out this site.