
And on we go with another round of looking at Hard Disk Drive performance. After writing 50 GB files to a number of hard drives in episode 2, I decided to have a look how the drives would perform after randomly deleting some about 2.5 TB of 50 GB files and then fill up the empty space again. Before running the tests my expectation was that the outer parts of the disk would be filled first, as write speeds are fastest there and write speeds would gradually reduce over time. The graph at the top of this post that plots the write data rate in MB/s over time shows something else, however.
Instead of using the outer parts of the disk first, the 50 GB files seem to be written in different places on the disk over time. The content of the files are still written consecutively, as the data transfer rate in most parts of the graph only change slowly, indicating that the write heads keep slowly moving inwards most of the time. However, using all space on the outer parts and only moving to the inner parts of the disk later does not seem to happen. I would have predicated a different outcome, but the overall average speed is still the same.
I then repeated the exercise with my 16 TB Seagate Ironwolf drive, which shows a very similar behavior:

Looks like a similar distribution algorithm is in place. A question I asked myself but can’t answer is if this distribution is decided by the ext4 filesystem or by the disk drive itself. After all, the disk drive could have a mapping table to map virtual to physical places on the disk. No way to tell from the outside. But in the end, it does not matter as on average, the speed is the good, there is no average degradation compared to writing to a completely empty driver.
You might wonder why I run such a test? Well, in practice, I often create backups of large virtual machine snapshots, where individual files can easily grow to double digit gigabyte sizes. During backup, old huge files are deleted and new huge files are created, which resembles this test scenario.
So much for today. In the next post, it’s finally time to move from analyzing the write behavior to analyzing the read data behavior of my drives.