Raising the Shields – 12 Years Later – Part 3 – Maintenance

So yes, I have quite a bit of physical infrastructure in various places and quite a lot of services running on them. So how can all of this be maintained with as little effort as possible? After all, doing software upgrades is not an exciting or entertaining task, so the process has to be automated as much as possible without loosing visibility if updates have gone smoothly or not.

While a decade ago, I only had a single Raspberry Pi in place, an apt update && apt upgrade did a good job. But with around 20 virtual machines these days and about the same amount of containers running on 4 distributed servers, this is no longer a viable approach.

Once Rasbperry Pis and virtual machines started piling up, I started to use the ‘Terminator‘ shell for updates, which could multiplex several terminals and ssh sessions and send all inputs to all shells at once. Two windows with 4 shells each, and I could update 8 machines with apt simultaneously. With a bit of scripting, the 8 ssh sessions opened automatically so there was little to do than to start the update and watch things happening 8 times simultaneously. A nice approach but it just scales so far. Also, it was not the right approach to update containers, which at some point started to pile up on my servers as well.

A further step up in my upgrade process was to use Ansible and a number of built-in modules to run Ubuntu and docker-compose upgrades. One Ansible script for each type of update, each producing a nice output at the end which updates were successful and which had failed. A nice bonus: The scripts could also check if there was enough space left on the system drives before the updates were started. This is particularly helpful for virtual machines that usually have smaller system partitions and at some point run out of space due system logs piling up.

This was as good as it got. After upgrading to a new Ubuntu version, however, I started to get trouble with Ansible, as the Ansible docker-compose module to upgrade my containers would not work reliably anymore. I tried to debug this for a while, but didn’t get anywhere in the time I thought it was worth to invest in this. At some point, I threw the Ansible docker-compose update script away and reverted back to a shell script that logs into the virtual machines that run containers and executes a script there to run a docker-compose update in the remote shell. Not pretty, but it does it’s job. Eventually, I want to return to Ansible or something similar for the container update but I don’t want to debug Ansible modules. They are supposed to work out of the box. If they don’t, well, then they are not fit for purpose.

And so here we go: My upgrade procedure today consists of manual apt update && apt upgrade commands on each physical server, a single Ansible script to update all virtual machines on all servers in the correct order, and a single bash script to update the containerized projects across all virtual machines. This could of course be automated further, but I actually do want to have a close look after each step of the upgrade procedure to make sure things have been executed correctly before starting the next script. Typically, I run the scripts one or twice a week during a break with a cup of coffee in my hand, as there is little manual interaction required.