Real World Performance – Part 2 – The RAM Disk

In the first part of this “Real World Performance” series, I’ve been looking at the performance of the 2 TB Crucial P2 NVMe SSD I bought for my new notebook. While it does its job, its major weakness is that continuous write throughput is only 100 MB/s. So I bought a top of the line Samsung 970 Evo Plus drive that, according to the specs, should have a much higher continuous write performance.

To be able to test the SSD in a “real world” scenario, i.e. by writing huge files to it, I needed a way to write data to the drive as fast as possible. Writing 0-bytes is probably quite straight forward, but I wanted to have random data to escape any type of compression anywhere in the software chain. The solution: A file with random data on a RAM disk.

Creating a RAM disk on Linux works as follows:

mkdir /mnt/ramdisk
mount -t tmpfs -o rw,size=4G tmpfs /mnt/ramdisk
chown -R martin:martin /mnt/ramdisk/

And the following command creates a 3 GB file with random content on the RAM disk:

dd if=/dev/urandom of=/mnt/ramdisk/3GB.bin bs=64M count=48 iflag=fullblock

Creating 3 GB worth of random data takes quite some time, so this is not part of the speed measurement at all. Next, I wrote a short bash script that uses the ‘cp’ command to copy the file over and over to /dev/null or a real device:

for i in {1..100}
do
 echo $i
 cp /mnt/ramdisk/3GB.bin /dev/null
 #cp /mnt/ramdisk/3GB.bin /x-temp-delete/3GB-$i.bin
done

It’s a crude script but it does its job. In a first run, I used the script to write 300 GB of data to nowhere, i.e. /dev/null, so I could get an idea of how fast the CPU and the cp command can actually read from RAM.

The interesting result: Running several instances of the script simultaneously results in a much higher RAM disk read rate. In other words, the software behind the cp command is too slow to take full advantage of the speed the RAM offers. Here are the results:

  • 1 instance / thread: 10 GB/s
  • 4 instances / threads: 26 GB/s
  • 5 instances / threads: 28 GB/s (which is just a tad more than half the theoretical performance of dual channel DDR4-3200 RAM as per Wikipedia)

The AMD Ryzen 7 4750U CPU of my Lenovo X13 notebook has 8 real cores and it takes 5 simultaneous instances of the script before the read throughput saturates. Also interesting: When I run only a single instance, the single core that is required for this is clocked at the maximum of 4.2 GHz. When 5 cores are working, the clock rate per core is down to 3 GHz to keep the CPU temperature in check.

My takeaway from this: Even a single “cp” command in a loop gives me enough speed for my subsequent SSD speed tests. With the four PCI Express 3 lanes of my SSD, the maximum theoretical throughput of the bus is around 3.9 GB/s.

So there we go, I’ve got my test setup ready, let’s see how my new NVMe SSD will perform in my X13 notebook. Stay tuned for part 3.