Scaleway Bare Metal: Removing the RAID

One of the reasons why I am considering moving my services from a bare metal server in a Hetzner data center in Finland to a bare metal server in a Scaleway data center in Paris is that they offer me twice the SSD disk space at about 2/3rds the price if I’m willing to compromise on CPU and disk performance. Instead of two 512 GB SSDs, they offer two 1 TB SSDs on their entry level servers. One thing I needed to do, however, was get a configuration that doesn’t use RAID-1, i.e. data duplication across the two drives. It turned out that this was more tricky than anticipated.

For good reasons, both Hetzner and Scaleway configure their bare metal servers with software RAID-1 for redundancy. If one of the SSDs fail, the server continues to run and no data is lost. As I synchronize my data between different servers, however, I’d rather have a configuration without RAID and trade the extra risk for twice the disk space.

When I started renting the server in the Hetzner data center two years ago, I installed Ubuntu on a virtual console and was hence in control of the complete installation process. It was thus easy to create a 60 GB system partition on one SSD, leaving the rest of the SSD as well as the second SSD empty. I then used the empty spaces to create partitions for ZFS and created a single file system that would span both SSDs. So far so good.

When renting a bare metal server at Scaleway, the installation process is unfortunately not quite as flexible. Here, the simple option is to let Scaleway install Ubuntu with RAID-1 and hence have around 800 GB available for data storage. That’s not what I wanted. There is also the option to get an empty server and to use Dell’s iDRAC 6 system console. Due to the decade old hardware, iDRAC 6 requires Java in the browser and some ugly hacks to actually get to a virtual screen and keyboard. Also, installing the operating system from an Ubuntu system image on my notebook mounted on the remote server over a slow DSL line did not sound very appealing. I tried, but I have to admit, I gave up after an hour. The hacks required to get IDRAC 6 working in 2024 are just not worth it.

I then moved to plan B, which was to let Scaleway install Ubuntu 22.04 with a RAID-1 configuration. Once the system was running, my plan was to remove the RAID configuration and shrink the system partition to 60 GB. I would then use the free space and the space on the second physical disk with ZFS, that can span a file system across both drives. Part 1 of the plan worked well, only a few commands were required to remove the RAID and free the second SSD. Removing the RAID from the first SSD and then shrinking the system partition to 60 GB was a much bigger challenge. After several broken server installs due to the system partition becoming inaccessible, I changed tactics and went for plan C.

Plan C was to use Scaleway’s simple option to get a system with Ubuntu 24.04 and RAID-1 like in the previous approach. After removing the RAID, however, I would not shrink the system partition. Instead, I would create an 800 GB image file on it, which I could then use as a virtual drive in combination with the second physical SSD and ZFS for a single file system. Yes, sounds like a real bad hack. Instead of ZFS going directly to a physical device, it has to go through a virtual disk file and the RAID code for a part of the storage space. But it works and is fast enough for my purposes. And instead of having just 900 GB available for my virtual machines and containers on the system partition and live in the constant fear of the system partition to overflow, I now have 1.7 TB storage space in one ZFS file system across two drives. And should I manage to overflow the ZFS data partition, the system partition is not impacted. I’m not proud of the solution, but it works. Hm, yeah, maybe I am sort of proud if it, anyway.

And here are the commands to create the setup yourself in case you come across a similar challenge at some point:

### Check which SSDs and partitions are currently used
### for the RAID

cat /proc/mdstat

md1 : active raid1 sdb4[1] sda4[0]
      975489024 blocks super 1.2 [2/2] [UU]
      bitmap: 4/8 pages [16KB], 65536KB chunk

--> sda4 and sdb4 is the RAID for the system partition (900 GB...)

### Remove sdb from the RAID1

sudo mdadm /dev/md1 --fail /dev/sdb4
sudo mdadm /dev/md1 --remove /dev/sdb4
sudo mdadm --zero-superblock /dev/sdb4

### Install ZFS

sudo apt update && sudo apt upgrade
sudo apt install zfsutils-linux
sudo zpool create -o ashift=13 -f zfs-pool-1 /dev/sdb4
sudo zfs create zfs-pool-1/data -o encryption=on -o keyformat=passphrase
sudo chown -R ubuntu:ubuntu /zfs-pool-1/data


### On the system partition that has lots of space, create
### a .img file for ZFS and then extend the pool
###
### fallocate is quick and 800G leaves 66G free for the system.
### Total ZFS capacity is 1.7 TB!

cd; mkdir  zfs-virtual-disks; cd zfs-virtual-disks
fallocate -l 800G zfs-disk1.img
sudo zpool add -o ashift=13 -f zfs-pool-1 /home/ubuntu/zfs-virtual-disks/zfs-disk1.img

### Now reboot to see everything works as intended

sudo reboot

### after reboot: pwd + mount

sudo zfs load-key -r zfs-pool-1/data
sudo zfs mount zfs-pool-1/data