|Benefits for LWN subscribers|
The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
Michael Crosby is one of the most influential developers working on Docker containers today, helping to lead development of containerd as well as serving as the Open Container Initiative (OCI) Technical Oversight Chair. At DockerCon 19, Crosby led a standing-room-only session, outlining the past, present and — more importantly — the future of Docker as a container technology. The early history of Docker is closely tied with Linux and, as it turns out, so too is Docker's future.
Crosby reminded attendees that Docker started out using LXC as its base back in 2013, but it has moved beyond that over the past six years, first with the docker-led libcontainer effort and more recently with multi-stakeholder OCI effort at the Linux Foundation, which has developed an open specification for a container runtime. The specification includes the runc container runtime which is at the core of the open source containerd project that Crosby helps to lead. Containerd is a hosted project at the Cloud Native Computing Foundation (CNCF) and is one of only a handful of projects that, like Kubernetes, have "graduated", putting it in the top tier of the CNCF hierarchy in terms of project stability and maturity.
Docker has both a slow moving enterprise edition and a more rapidly released community edition. At DockerCon 19, Docker Enterprise 3.0 was announced based on the Docker Community Edition (CE) 18.09 milestone. Docker developers are currently working on finalizing the next major release of Docker CE with version 19.03.
The time-based nomenclature for Docker CE release would imply that 19.03 should have been a March release, but that's not the case. Docker has been somewhat delayed with its numbered release cycle, with recent release dates not matching up with actual general availability. For example, the current Docker CE 18.09 milestone became generally available in November 2018, not September 2018 as 18.09 would seem to imply. The current Docker CE number is, however, more closely aligned with the feature-freeze date for releases. The GitHub repository for Docker CE notes that the feature freeze for the 19.03 release did not occur until March 22. The beta 4 release is currently scheduled for May 13, with the final general availability release date listed as "to be determined" sometime in May 2019.
Crosby said that among the big new features that are set to land in Docker CE 19.03 is full support for NVIDIA GPUs, marking the first time that Docker has had integrated GPU support in a seamless manner. Crosby added that NVIDIA GPU support will enable container workloads to take full advantage of the additional processing power offered by those GPUs, which is often needed for artificial intelligence and machine learning use cases.
Containerd is also getting a boost, advancing to version 1.2 inside Docker CE. Containerd 1.2 benefits from multiple bug fixes and performance gains. Among the new capabilities that have landed in this release is an updated runtime that integrates a gRPC interface that is intended to make it easier to manage containers. Overall, Crosby commented that many of the common foundational elements of Docker have remained the same over time.
"Even though we've had kind of the same primitives from back in 2013 in Docker, they've been optimized and matured," Crosby said.
The future of Docker
Docker containers were originally all about making the best use possible of Linux features. Just as Docker containers started out based on a collection of Linux kernel features, the future of Docker is about making the best use of newer kernel features. "Containers are made up of various kernel features, things like cgroups, namespaces, LSMs, and seccomp," he said. "We have to tie all those things together to create what we know of now as a container.
Looking forward to what's next for containers and Docker, Crosby said that it's all about dealing with the different requirements that have emerged in recent years. Among those requirements is the need to take advantage of modern kernel features in Linux 5.0 and beyond, as well as dealing with different types of new workloads, including stateful workloads, which require a degree of persistence that is not present in stateless workloads. Edge workloads for deployments at the edge of the network, rather than just within a core cloud, are another emerging use case. Internet of Things (IoT) and embedded workloads in small footprint devices and industrial settings are also an important use case for Docker in 2019.
One of the Linux kernel features that Docker will take full advantage of in the future is eBPF, which will someday be usable to write seccomp filters. Crosby explained that seccomp and BPF allow for flexible system call interception within the kernel, which opens the door for new control and security opportunities for containers.
Control groups (cgroups) v2 is another Linux feature that Docker will soon benefit from. Cgroups v2 has been in the kernel since the Linux 4.5 release, though it wasn't immediately adopted as a supported technology by Docker. The project isn't alone in not immediately supporting cgroups v2, Red Hat's Fedora community Linux distribution also has not integrated cgroups v2, though it plans to for the Fedora 31 release that is currently scheduled for November. Crosby said that cgroups v2 will give Docker better resource isolation and management capabilities.
Enhanced user namespace support is also on the roadmap for Docker as part of a broader effort for rootless containers; it will help to improve security by not over-provisioning permissions by default to running containers. The idea of running rootless Docker containers with user namespaces is not a new one, but it's one that is soon to be a technical reality. "Finally, after all these years, user namespaces are in a place where we can really build on top of them and enable unprivileged containers," he said.
More kernel security support is also headed to Docker in the future. Crosby said that SELinux and AppArmor are no longer the only Linux Security Modules (LSMs) that developers want. Among the new and emerging LSMs that Docker developers are working to support in the future is Landlock. Crosby added that developers will also have the ability to write their own custom LSMs with eBPF. Additionally, he highlighted the emergence of seccomp BPF.
Making containers more stateful
One of the areas that Crosby is most interested in improving is the stateful capabilities of Docker, which in his view are currently relatively limited. Better stateful capabilities include backup, restore, clone, and migrate capabilities for individual containers. Crosby explained that stateful management today in Docker typically relies on storage volumes and not the actual containers themselves.
"We kind of understand images now as being portable, but I also want to treat containers as an object that can be moved from one machine to another," Crosby said. "We want to make it such that the RW [read/write] layer can be moved along with the container, without having to rely on storage volumes." Crosby added that he also wants to make sure that not only the container's filesystem data is linked, but also the container configuration, including user-level data and networking information.
Rethinking container image delivery
Container images today are mostly delivered via container registries, like Docker Hub for public access, or an internal registry deployment within an organization. Crosby explained that Docker images are identified with a name, which is basically a pointer to content in a given container registry. Every container image comes down to a digest, which is a content address hash for the JSON files and layers contained in the image. Rather than relying on a centralized registry to distribute images, what Crosby and Docker are now thinking about is an approach whereby container images can also be accessed and shared via some form of peer-to-peer (P2P) transfer approach across nodes.
Crosby explained that a registry would still be needed to handle the naming of images, but the content address blobs could be transferred from one machine to another without the need to directly interact with the registry. In the P2P model for image delivery, a registry could send a container image to one node, and then users could share and distribute images using something like BitTorrent sync. Crosby said that, while container development has matured a whole lot since 2013, there is still work to be done. "From where we've been over the past few years to where we are now, I think we'll see a lot of the same type of things and we'll still focus on stability and performance," he said.