HDD Performance – Part 3 – Measurement Setup

In part 1 and 2 of this series, I’ve had a first look at the performance of a number of different hard drives I use to back up large amounts of data. I currently have around 8 TB of data that needs to be backed-up regularly, so speed is of the essence and decides whether a backup cycle takes an hour, half a day, or even more. Before I go on with further measurement results, here’s a quick summary of how I collected my data:

Data Collection Commands

Linux offers a number of great command line utilities to measure data throughput to and from block devices. For my tests I decided to use iostat to record the read or write speed once per second to a file:

iostat -d -t -m 1 /dev/dm-1 >> filename.log

In the example above, /dev/dm-1 is the device name of the block device, i.e. the hard disk that I want to monitor. A tail -F on filename.log shows me the data in real time on the console.

Data Formatting and Reduction Commands

As I transferred terabytes of data during my tests, each test run took from several hours to more than a day, so the log file grows lather large. To generate a graph, I decided to discard a lot of intermediate results and only used every 10th value. In addition, the first rows of the log file that contain the header has to be removed and the time and date from one line has to be combined with the read/write speed results of another line. So here are the commands to do that:

# Do not use the first two header lines of the log
#
sed '1,2d' filename > 0.log

# Date/time is contained in line 1 every 5 lines.
# Read write speed in line 3 every 5 lines. 
# Remove other lines of each entry.
#
sed -n '1~5p; 3~5p' 0.log > 1.log

# Combine date/time and read/write speed results 
# on separate lines into a single line
#
paste -d ' ' - - < 1.log > 2.csv

# Use only every 10th line as otherwise Libreoffice Calc gets 
# too slow.
#
sed -n '0~10p' 2.csv > 3.csv

The resulting 3.csv file can then be imported to Libreoffice Calc to calculate average transfer speeds and to generate the graphs shown in part 1 and 2 and in the following parts of this series. A bit of a semi-automated process that could definitely be improved, but it does it’s job.

Writing 50 GB files Until the Disk is Full

As manually copying 50 GB files to the output drive until it is full is impossible, I came up with the following script that copies a single 50 GB file with randomized content to the disk using different filenames until it is full:

#!/bin/bash

# Check if the correct number of arguments is provided
if [ $# -ne 3 ]; then
    echo "Usage: $0 <source_file> <target_directory> <number_of_iterations>"
    exit 1
fi

source_file="$1"
target_dir="$2"
iterations="$3"

# Check if source file exists
if [ ! -f "$source_file" ]; then
    echo "Error: Source file does not exist."
    exit 1
fi

# Check if target directory exists
if [ ! -d "$target_dir" ]; then
    echo "Error: Target directory does not exist."
    exit 1
fi

# Check if iterations is a positive integer
if ! [[ "$iterations" =~ ^[1-9][0-9]*$ ]]; then
    echo "Error: Number of iterations must be a positive integer."
    exit 1
fi

for ((i=1; i<=$iterations; i++)); do
    timestamp=$(date +%Y%m%d_%H%M%S)
    target_file="${target_dir}/huge_file_${timestamp}_${i}"
    
    cp "$source_file" "$target_file"
    
    echo "Copied file $i of $iterations"
    
    sleep 1  # Add a small delay to avoid overloading the system
done

echo "Copying complete. $iterations files have been copied."

Again, crude, but it does its job.

Reading All Files on the Disk

Reading all files on the disk is much easier and can be done with a single command:

time find . -type f -exec cat {} + > /dev/null 2>&1

And that’s it, this is my measurement setup! Stay tuned for more results in the following posts.