Bare Metal in the Cloud – Part 1 – Let’s Rent Hardware

So far, the public cloud has pretty much been ‘virtual machines’ for me, with varying numbers of CPUs, memory and storage, depending on the application. Recently, however, I’ve been looking for a way run copies of my home cloud based virtual machines (VMs) and Docker containers in a data center for redundancy purposes. For this purpose, renting separate VMs would be too expensive and inflexible. So, I decided to rent a ‘bare metal’ server in the cloud, and build a redundant copy of my cloud at home on top of it. This is what is referred to as a hybrid cloud, and this blog series takes a closer look at my adventures in this area.

The General Setup

The basic idea is to rent a bare metal server in the cloud, install Ubuntu 20.04 as host operating system and, like at home, use KVM/Qemu for virtualization. This way, I can use copies of the virtual machines running at home by simply uploading them and changing their IP addresses. As I just need warm standby redundancy and can live with data loss of things that have changed over a few hours, it is sufficient for me to synchronize the data on the backup VMs in the data center with rsync in regular intervals. This way, the backup VMs in the data center can take over in case my primary site, i.e. my private cloud at home, becomes catastrophically unavailable.

Let’s Go Shopping

Pretty much every cloud hosting company that offers virtual machines also has bare metal servers for rent. There is a wide price difference for similar servers, so some effort has to be spent to find a suitable offer. As I wanted my server to be located in central or northern Europe to keep round trip delay times low and because I prefer smaller companies to global behemoths, I had a closer look at what OVH, 1und1 and Hetzner had on offer. In the end, I decided to go for a Ryzen 5 3600 based server with 64 GB of RAM and two 500 GB SSD drives in Hetzner’s data center in Finland. When I rented the server, there was no one-time setup fee and the monthly cost was €40.46, including taxes. That’s 6 euros less than what the same server would cost per month in one of Hetzner’s data centers in Germany. The price difference might be due to lower electricity prices in Finland. While the round trip time to a server in Hetzner’s German data center is around 15 milliseconds, it takes around 45 milliseconds to their data center near Helsinki and back. That’s quite a bit, but I would argue it’s not worth an additional 6 euros a month for a redundancy site.

The Setup Process

Like virtual machines, the bare metal server is available instantly if no special modifications like additional RAM or more HDD or SSD drives are required. By default, the two 500 GB SSDs are recommended to be run in a RAID-1 configuration, i.e. the total storage capacity would be 500 GB, and a failure of one SSD would not lead to data loss. However, I needed at least 1 TB of storage, so I chose to run them without redundancy. I’d obviously stick to RAID-1 if this setup was my primary site, but for a secondary site that I would immediately start to duplicate if the primary site ever failed for a longer time, I can live with the additional risk.

Installing the Host Operating System

As it is a bare metal server, Hetzner does not only offer standard Linux operating system images that would install automatically on the server after purchase, but also a rescue system that runs entirely in memory. This allows to partition the drives with fdisk as needed before the host operating system is installed. A nice touch: As part of the order process, a public SSH key can be supplied, which is used by the rescue system to allow access.

Once connected to the rescue system via SSH, there are several options how to install the host OS, including the use of an image on a mountable share at a remote location on the Internet. However, since I just needed an off-the shelf Ubuntu 20.04, I chose to use a local standard image and Hetzner’s install script for customization. The script and its configuration file is part of the rescue system and can be modified before execution. This way, it’s possible to manually create partitions instead of using the default RAID-1 configuration.

For my purposes, I chose to use fdisk to create a small partition for /boot and a 60 GB ext4 partition for the operating system. The remaining space on the first SSD, as well as the complete second SSD, remained unallocated. I then modified the partition information in the startup script’s configuration file and started the installation process. This turned out to be a pretty painless process, and only took a minute or two to complete. Once done, the server is rebooted and ready. The script also copies the public ssh key given during the order process into the host OS, so it’s immediately accessible in a secure manner.

A Single Data Partition Spanning the Two SSDs

Now that the host operating system was up and running, the next step was to partition the remaining SSD space as a data partition. As I wanted to have a single partition to span the free space on both drives, ext4 was not an option. So I chose to go for ZFS that I also use at home for that. There’s a rumor out that the ZFS version included in Ubuntu 20.04 is not very performant when it comes to encryption, so I chose to use LUKS for encryption of the two partitions and then use those for ZFS. That’s somewhat more complicated than using ZFS’s built-in enryption, but I wanted to have the best possible performance, and running without encryption for the data partition was definitely not an option.

A Look at the Hardware

Since this is a bare metal server, the host operating system has direct access to the hardware, so one can have a closer look at the server. Here are the details /proc/cpuinfo reveals about the CPU:

[...]
vendor_id : AuthenticAMD
cpu family : 23
model : 113
model name : AMD Ryzen 5 3600 6-Core Processor
[...]

This processor with 6 cores and 12 threads was released in 2019, which means that it’s currently 2 generations behind current hardware. Good enough for me, and still compares very well to current generation processors, as I will show in one of the next posts.

I then had a look at the SSDs, which revealed both light and shadow:

=== START OF INFORMATION SECTION ===                                                                
Model Number:                       SAMSUNG MZVLB512HBJQ-00000
Serial Number:                      S4....
Firmware Version:                   EXF7201Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x.....
Total NVM Capacity:                 512.110.190.592 [512 GB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512.110.190.592 [512 GB]
Namespace 1 Utilization:            4.565.708.800 [4,56 GB]
Namespace 1 Formatted LBA Size:     512

[...]

 
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        27 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    52%
Data Units Read:                    708.508.674 [362 TB]
Data Units Written:                 586.594.645 [300 TB]
Host Read Commands:                 1.272.517.019
Host Write Commands:                1.762.386.687
Controller Busy Time:               11.040
Power Cycles:                       10
Power On Hours:                     1.157
Unsafe Shutdowns:                   0
Media and Data Integrity Errors:    0
Error Information Log Entries:      70
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               27 Celsius
Temperature Sensor 2:               23 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

On the bright side, the two SSDs are from Samsung, which vouches for some quality. However, the SSDs are far from unused. The first SSD in the system has already 300 TB data written on it. The second SSD already had 560 TB written to it. That’s quite an amount and I feel a bit uneasy about it. Perhaps a RAID-1 configuration would have been better after all? But that wouldn’t have left me with enough capacity to use. So be that as it may, the system is only there for redundancy, so I hope the decision I’ve taken will not bite me later. Only time will tell.

KVM/Qemu installation

So once the encrypted ZFS filesystem was up and running, the next step was to install KVM/Qemu. Fortunately, I wrote down the steps I took when I did that for the first time a few years ago on my cloud server at home, so KVM was up and running in no time. I then transferred copies of the VM images I run on my cloud server at home, which took some of time. 40 Mbit/s in the uplink at home is not too shabby, but VM images files are rather large. Once on the encrypted data partition in the cloud, the images can be used in KVM just like at home, and after changing their IP addresses and adding NAT port forwardings for ssh and the tcp ports used by the services running on them, they were ready ‘for production’. I’ll have a closer look at how that is done in the next post on the topic.

Summary

Getting a bare metal server up and running in the cloud was much easier than I anticipated. Still, I learnt a lot of new things in the process, and I will certainly use those 64 GBs of RAM and 6 Ryzen Cores for more than just redundancy spare duty. So rest assured that there will be exciting follow-up posts on topics such as NAT port fowarding to use a single IP address for several virtual machines, getting and using additional public IP addresses for individual VMs, and a look at the bare metal and virtual performance of the server compared to other hardware.