Why Is Storage On Kubernetes So Hard?


Container orchestration tools like Kubernetes are revolutionizing the way applications are being developed and deployed. With the rise of the microservices architecture and decoupling of infrastructure from application logic from the developer’s point of view, developers are becoming more focused on building software and delivering value.

Kubernetes abstracts away the physical machines it is managing. With Kubernetes, you can describe the amount of memory and compute power you want, and have it available without worrying about the underlying infrastructure.

When managing Docker images, Kubernetes also makes applications portable. Once they are developed with a containerized architecture using Kubernetes, they can be deployed anywhere – public cloud, hybrid, on-prem – without any change to the underlying code.

While Kubernetes is extremely useful in many aspects, like scalability, portability, and management, it does not support storing state. Almost all production applications are stateful, i.e. require some sort of external storage.

A Kubernetes architecture is very dynamic. Containers are being created and destroyed, depending on the load and on the specifications of the developers. Pods and containers can self-heal and replicate. They are, in essence, ephemeral.

However, a persistent storage solution cannot afford this dynamic behavior. Persistent storage cannot be bound to the rules of being dynamically created and destroyed.

Stateful applications face challenges in terms of portability, when they need to be deployed on another infrastructure, perhaps another cloud provider, on-prem, or on a hybrid model. Persistent storage solutions can be tied to a specific cloud provider.

Moreover, the storage landscape for cloud native applications is not easy to understand. The Kubernetes storage lingo can be confusing, with many terms that have intricate meanings and subtle changes. Additionally, there are many options between native Kubernetes, open-source frameworks, and managed or paid services that developers must consider before reaching a decision.

You can see below the cloud-native storage solutions listed in CNCF’s landscape:

Maybe the first idea that comes to mind is deploying a database in Kubernetes: pick out a database solution that fits your needs, containerize it to run on local disk, and deploy it in your cluster as just another workload. However, due to the inherent properties of databases, this does not work well.

Containers are built with statelessness as a principle. This makes spinning containers up or down easy. Since there is no data to be saved and to be migrated, the cluster does not deal with the usually intensive work of disk reads and writes.

With a database, state needs to be preserved. If the database that is deployed on the cluster in a containerized manner does not migrate or is not being spun up frequently, then the physics of data storage come into play. Ideally, containers that would be using the data would be in the same pod with the database.

This is not to say deploying databases in containers is a bad idea – in some use cases, this approach can be quite adequate. In a test environment, or for a task that does not require production-level amount of data, databases in clusters can make sense, due to the small scale of the data being held.

In production developers usually rely on external storage.

How does Kubernetes communicate with storage? It uses control plane interfaces. These interfaces link Kubernetes with external storage. These external storage solutions linked to Kubernetes are called Volume plugins,. Volume Plugins enable abstracting storage and grant storage portability.

Previously, volume plugins were built, linked, compiled, and shipped with the core Kubernetes codebase. This greatly limited the flexibility of the developer, and brought on additional maintenance costs. Adding new storage options required changes in the Kubernetes codebase.

With the introduction of CSI and Flexvolume, volume plugins can be deployed on a cluster without the changes to the codebase.

Native Kubernetes & Storage

How does native Kubernetes handle storage? Kubernetes natively offers some solutions to manage storage: ephemeral options, persistent storage in terms of Persistent Volumes,  Persistent Volume Claims, Storage Classes, or StatefulSets. This can be quite confusing.

Persistent Volumes (PV) are storage units that have been provisioned by an administrator. They are independent of any single pod, breaking them free from the ephemeral life cycle of pods.

Persistent Volume Claims (PVC), on the other hand, are requests for the storage, i.e. PVs. With PVC, it’s possible to bind storage to a particular node, making it available to that node for usage.

There are two ways of dealing with storage: statically or dynamically.

With static provisioning, administrator provisions PVs that they think pods might require before the actual requests are made, and these PVs are manually bound to specific pods with explicit PVCs.

In practice, statically defined PVs are not compatible with the portable structure of Kubernetes, since the storage that is being used can be environment-dependent, such as AWS EBS or GCE Persistent Disk. Manual binding requires changes in the YAML file to point to the vendor-specific storage solutions.

Static provisioning also goes against the mindset of Kubernetes in terms of how developers think about resources: CPU and memory are not allocated beforehand and bound to pods or containers. They are dynamically granted.

Dynamic provisioning is done with Storage Classes. Cluster administrator do not need to manually create the PVs beforehand. They instead create multiple profiles of storage, just like templates. When a developer makes a PVC, depending on the requirements of the request, one of these templates is created at the time of the request, and attached to the pod.

This is a very broad overview of how external storage is generally handled with native Kubernetes. There are many other options to consider, however.

CSI – Container Storage Interface

Before moving forward, I want to introduce Container Storage Interface. CSI is a unifying effort created by CNCF Storage Working Group, aimed towards defining a standard container storage interface that can enable storage drivers to work on any container orchestrator.

CSI specifications have already been adapted into Kubernetes, and numerous driver plugins are available to be deployed on a Kubernetes cluster. Developers can access storage exposed by a CSI compatible volume driver with the csi volume type on Kubernetes.

With the introduction of CSI, storage can be treated as another workload to be containerized and deployed on a Kubernetes cluster.

For more information, listen to our episode on CSI with Jie Yu.

Open-Source Projects

There’s a significant upsurge of tools and projects around cloud-native technologies. As one of the most prominent problems in production, dealing with storage on a cloud-native architecture has a fair share of open-source projects dedicated to solving it.

The most popular projects regarding storage are Ceph and Rook.

Ceph is a dynamically managed, horizontally scalable, distributed storage cluster. Ceph provides a logical abstraction over the storage resources. It’s designed to have no single point of failure, to be self-managing, and to be software-based. Ceph provides block, object, or file system interfaces into to same storage cluster simultaneously.

Ceph’s architecture is complicated, with a multitude of underlying technologies such as RADOS, librados, RADOSGW, RDB, its CRUSH algorithm, and components like monitors, OSD, and MDS. Without delving into its architecture, the key point to take is, Ceph is a distributed storage cluster that makes scalability much easier, eliminates single points of failure without sacrificing performance, and provides a unified storage with access to object, block, and file.

Naturally, Ceph has been adapted into the cloud-native environment. There are numerous ways you can deploy a Ceph cluster, such as with Ansible. You can deploy a Ceph cluster and have an interface into it from your Kubernetes cluster, using CSI and PVCs.

Ceph architecture.

Another interesting, and quite popular project is Rook, a tool that aims to converge Kubernetes and Ceph – to bring compute and storage together in one cluster.

Rook is a cloud-native storage orchestrator. It extends Kubernetes. Rook essentially allows putting Ceph into containers, and provides cluster management logic for running Ceph reliably on Kubernetes. Rook automates deployment, bootstrapping, configuration, scaling, rebalancing, i.e. the jobs that a cluster admin would do.

Rook allows deploying a Ceph cluster from a yaml, just like Kubernetes. This file serves as the higher-level declaration of what the cluster admin wants in the cluster. Rook spins up the cluster, and starts actively monitoring. Rook serves as an operator or a controller, making sure that the declared desired state in the yaml file is upheld. Rook runs in a reconciliation loop that observes the state and acting upon the differences it detects.

Rook does not have its own persistent state, and does not need to be managed. It’s truly built according to the principles of Kubernetes.

Rook, bringing Ceph and Kubernetes together, is one of the most popular cloud-native storage solutions, with almost 4000 Github stars, 16.3M downloads, and 100+ contributors.  

Being accepted into CNCF as the first storage project, Rook has recently been accepted to the Incubation stage.

For more information about Rook, listen to our episode on Kubernetes Storage with Bassam Tabara.

For any problem in an application, it’s important to identify the requirements, and design the system or pick the tools accordingly. Storage in the cloud-native environment is no different. While the problem is quite complicated, there are numerous tools and approaches out there. As the cloud-native world progresses, new solutions will also undoubtedly emerge.