Kubernetes Intro – Part 10 – Persistent Storage in a Managed Cluster

Wow, this is part 10 in my series on how to get started with Kubernetes! I am obviously having a lot of fun with the topic, and it’s really nice to be able to experiment with the technology, as it is not only the basis for 5G core networks, but massively transforms all parts of telecommunication networks in general. Today’s topic: How to store data persistently in a managed Kubernetes cluster (with a practical example, of course).

The Docker Theory

Docker, and, on a larger scale Kubernetes, are all about managing containers, and everything inside a container is ephemeral. That means that when a container goes away, so does all data inside it. So if a database runs in a container and creates database files, or a web server in a container accepts files which it will then store, the files need to be stored outside of a container. In Docker on a single host, this is done by creating a mapping between a path on the host’s file system to a path inside a container. When the container is deleted or crashes, it can be restarted and the data is still there, as it was stored outside.

The Kubernetes Theory

In a Kubernetes cluster, the same is true, but the implementation is done differently. Here, it does not make sense to just use storage of one of the worker nodes (a bare metal server or virtual machine on which the pods/containers are executed), and bind it into the container. This is because next time a container (pod) is started, it might run on a different worker node and the mapping would not be available. To make persistent storage independent of worker nodes, Kubernetes uses a concept referred to as Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). I’ve explored the concept already in more detail in episode 4 of this series. There, I’ve used a local Minikube installation, which creates the persistent volumes in the local temp directory. ‘Real’ Kubernetes clusters with many worker nodes need to do it in a different way, however. But just how is it done?

The answer is straight forward: In managed Kubernetes clusters, persistent volumes are ‘translated’ into block storage volumes. When creating a new Kubernetes deployment that includes a PersistentVolumeClaim configuration, the (manged) Kubernetes cluster translates this into the creation of a new block storage volume which is then mapped into the container. These block storage volumes are not Kubernetes specific at all, but are also used to add storage to virtual machines. The important detail: Such block storage is not local to a virtual machine or a worker node. Instead, such block storage is allocated on a storage server and accessed over the network. The software in the container doesn’t care, it just writes data into a directory path.

The Practical Example in a Managed Kubernetes Cluster

The important thing from a Kubernetes app perspective: The translation of Kubernetes persistent volume claims into block storage volumes is transparent for Kubernetes deployments and existing yaml files do not have to be changed when moving from one Kubernetes hosting company to another. So let’s give this a try in practice: In part 4, I’ve deployed a WordPress blog in a local Minikube cluster that uses two persistent volume claims, one for the database and one for WordPress file storage. The same yaml files and the same kubectl instruction can be used to deploy WordPress to a hosted Kubernetes cluster. Instead of using the /temp folder for the persistent volume claims, block storage volumes are created on the fly. The only change I made to the wordpress-deployment.yaml file was to change the IP address ‘type‘ from ‘Loadbalancer‘ to ‘ClusterIP‘:

spec:
  type: ClusterIP

This has nothing to do with data persistency, but it enables the use of the Ingress Loadbalancer I’ve installed in my manged Kubernetes cluster in part 8 of the series. After making this small change, the WordPress blog can be deployed into the managed cluster with the following command:

kubectl create -k .

To make WordPress accessible to the outside world, a forwarding rule needs to be added to the Ingress rules my-new-ingress.yaml file:

  - host: wp.example.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: wordpress
            port:
              number: 80

The additional rule is applied with the following command:

kubectl apply -f my-new-ingress.yaml

And that’s it, WordPress is then available via http://wp.example.com, and of course after linking this domain name with the external IP address of the cluster in the local /etc/hosts file.

How Persistent is the Persistent Data?

Beware that persistency is a bit of a tricky thing. If you delete the WordPress deployment (e.g. with kubectl delete -k .), the persistent volumes (i.e. the block storage volumes behind them) are also deleted and your data is gone. In other words: As soon as persistency comes into the game, updates and changes to a deployment must be done in another way than deleting and creating the complete project.