HDD Performance – Part 1 – Huge Files on a New 20TB Drive

My data heap keeps growing and I do have a good multi-layer and multi-location backup strategy. Offline and off-site storage is the motto of the day, which requires hard disks with large capacities so data can be physically moved. So far, I used several 8 TB hard disks to which I would sync the data from various sources. I’ve come to a point however, where 8 TB is no longer enough and incidentally, I noticed a significant slow down during my backup procedures. So I bought my first 20 TB drive which, so far, performs very nicely. But I really do wonder why my 8 TB drives seem to have slowed down so much while that new shiny 20 TB drive (still?) performs much better. So it was time to do some benchmark tests with different drives and real world data so I can see how new drives perform with my data and analyze performance of existing drives. But why do I care? Because it makes a huge difference if 10 TB of data is moved to or from a disk drive at an average of 50 MB/s or 200 MB/s. At 50 MB/s, moving such an amount of data requires 55 hours, while at 200 MB/s it only takes 13 hours. And we are not even talking 20 TB yet. You see where this goes…

Small, Large, Huge Files

So let’s lay the groundwork for my performance measurements. I don’t really care for theoretical maximum speeds, I care about how long it takes to transfer my real world data to and from a disk drive. In practice I have a mix of files. Huge virtual machine images, often exceeding 50 GB per snapshot file. Then, I have a huge number of video files that keep changing as they are produced, stored and later discarded, each between 1 to 10 GB in size. And then there’s the heap of photos, with sizes between 2 and 5 MB each. And finally, there is a great amount of small document files, many of them only a few kilobytes or a few hundred kilobytes in size. As you are probably aware, file size has a significant impact on hard disk storage performance. Large files in the GB range can be efficiently read and written to hard disks, because a lot of data is transferred in one stream and can hence be put in sequence on the drive. This means the drive heads have to move very little. Small files on the other hand can be at many different places on the hard disk, which requires many drive head relocations, which kills performance.

Hardware Considerations and SMR – Is it An Issue?

On the hardware side, hard disks of different make and size have different performance. Particularly SMR (shingled magnetic recording) used by some manufacturers for some of their drives is said to have a very detrimental effect when dumping large amounts of data on a disk. Unfortunately, there is little practical data found on the Internet that show the effect. This is the ‘only’ more detailed analysis of the issue I could find. What I also don’t quite understand is that over time, my drives seem to get a lot slower and erratic, even if there is still a good amount of free space on them. Or is it just my imagination? And will formatting them and rewriting all data help? Well, let’s see.

The Shiny New 20 TB Drive

All right, so much for the ground work. To start this blog series, I would like to show two graphs I have created by writing data to a new 20 TB drive that was around 40% full already. The graph above has been created by writing files of a size of 70 GB each until the drive was full, around 12 TB I would say. This takes a while but shows how the drive performs over time. As security is important, I am using LUKS encryption below an ext4 file system, so everything that is put on the disk is encrypted before it is sent over the USB cable. Yes, I’m using USB attached storage, that had to be mentioned as well. Fortunately, encryption is not a bottleneck, as the drives are much slower than what the notebook can provide in terms of encrypted data. The x-axis of the graph is time, the y-axis is the speed in MB/s at which the 70GB files are written to the drive (around 12 TB total). I interpret the graph above as follows:

At the beginning, I get a write speed between 200-250 MB/s, probably because the drive uses the outer parts of the spinning disks where more data per time can be stored compared to the inner parts of the disks. So over time, the data rate at which data can be written on the drive diminishes. An interesting thing occurs when the drive is almost full. Suddenly, the data rate increases again from 120 MB up to 250 MB for an hour, before the drive is finally full. I’m not sure why that is. Perhaps the drive keeps some space free on the outside of the platters and now puts it to use?

After the disk is full, I wanted to see how the drive behaves if I randomly delete 70GB files to get 2.5 TB free space on the 20 TB drive again. I then write 70GB files again to the drive until the 2.5 TB are used up again. The resulting second graph above also makes sense to me and I interpret it as follows:

As I deleted files randomly, space became available in different places on the drive which are used consecutively until the drive heads had to go to a different place on the disk platters that can be written at a faster or slower speed. I would have also understood if the drive had decided to use the outer parts first and then work towards the center, in which case we would have seen a slowly declining curve with downwards spikes at particular points once the heads move to a more inward point and jump over allocated areas. But it didn’t do that, or at least not all the time. Be that as it may, performance is probably as good as for the initial write.

From my point of view, both tests worked well on that drive, I’m happy with the performance. With the groundwork now laid, I’ll now have a look what the same write pattern does on my older 8 TB drives, how large and small files influence the behavior and a couple of other things. Let’s see what I can discover.

2 thoughts on “HDD Performance – Part 1 – Huge Files on a New 20TB Drive”

  1. Very interesting! Thanks for sharing the results. Looking forward to the 8TB data.

    You mentioned SMR, but do your drives feature SMR?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.