Kubernetes Intro – Part 6 – Managing All Those Clusters

Yes, this is already part 6 of my ongoing Kubernetes intro series. In part 5, I have moved ever deeper into the cloud by looking at how to create managed Kubernetes clusters in Amazon’s and Linode’s clouds. Containers and Kubernetes are all about scale, so one might wake up one day with many Kubernetes clusters to manage. And you might have guessed it, that must be automated as well to further scale infrastructure. There are quite some tools available for managing all the Kubernetes clusters of an organization, and today I will have a look at three of them, Cluster API, Flux CD and Argo CD.

So just to recap the overall story so far: Instead of exposing a bare metal server or a virtual machine to a piece of software, containers have been invented to separate software running on the same machine from each other. At some point, one wants to run more containers on a single server than can be managed by hand, or even use many servers to run containers on. This is where Kubernetes comes in, it manages the containers and the servers they run on. The idea that one would then run a single and huge Kubernetes cluster in an organization for everyone is also not something that scales very well at some point. Hence at some point, an organization will decide to run several, perhaps even hundreds or thousands of Kubernetes clusters for different purposes. To get to these scales, clusters themselves need to be centrally managed.

Cluster API

In practice, ‘centrally managed’ means to have the means to create, update, manage, and destroy individual Kubernetes clusters, including the worker nodes. A software package that can do this is Cluster API. In principle it works as follows: Software in a central “Kubernetes Management Cluster” controls many “Kubernetes Workload Clusters“. A workload cluster is a full Kubernetes cluster all on its own, with control nodes and worker nodes. All of them are initially deployed and can also be destroyed again with Cluster API, which has plugins to to get nodes for Workload clusters from different infrastructure providers (e.g. Linode, Amazon, Azure, etc), or make use of on-site bare metal servers or virtual machines (e.g. provided by a on-site and self owned OpenStack installations).

Flux CD and Argo CD (Continuous Delivery)

When it comes to containers and clusters, pretty much everything about their configuration and content is described in text based yaml files. That means that over time, there are a lot of yaml files all around, managed by different people, and it gets hard to keep track where which yaml files are, their purpose, and what they are doing. That sounds like a familiar problem in the software development domain and the answer there was to establish central code repositories to which all contributors to a project can push their changes. The most popular tool for this is ‘git’. The same idea is now also used to manage all configuration files for a cluster or even for all clusters of a company in a single GIT repository. Welcome to GitOps, or Infrastructure as Code.

To automate changes, e.g. to add worker nodes, or to update the software that is run in a container on some cluster somewhere, a yaml file is changed and the updated file is pushed to the git repository. Applying that change then works automatically, because each cluster runs a pod that monitors the central repository. Once it notices that one of the yaml files that applies to the cluster has changed, it runs ‘kubectl’ with that yaml file to update itself. In other words, users of a cluster no longer use the kubectl command line program to modify the cluster or the pods running on it. Instead, they simply push modified yaml files to the git repository. In the wild, there seem to be at least two popular software products available to do this job: Flux CD and Argo CD. You could also call them “GitOps tools for Kubernetes”.

So here we go, this is how large number of clusters can be managed in a company: A single cluster to control all other clusters in the company and a single git repository to store all yaml files that describes everything, from the setup of each cluster, down to what is running in each single pod.