Last year I post-processed videos from the Vintage Computing Festival 2020 on my rather old notebook and was quite underwhelmed by the time it took. Even VMs in the cloud did not significantly speed up the process. This year I’ve had a workstation with an Nvidia graphics card and hardware H264 encoding for the purpose and I’ve also upgraded to a much faster notebook. And indeed, post-processing was a lot faster than last year and I ran a comprehensive ffmpeg speed comparison on a lot of devices at my disposal this year. The result is shown in the bar graph above and here are some thoughts on what the different results mean in relation to each other:
Ffmpeg on Nvidia
On my workstation with the Nvidia Quadro M2000 GPU from ca. 2017/18, cutting a video with a duration of 47 minutes and 46 seconds out of an H264 input video stream and using the same format for the output file with good quality took 2 minutes 42 seconds. In other words, compared to the runtime of the output video, encoding was 17,7 times faster than the playback speed.
Ffmpeg on Old and New Notebooks
Compared to last year, I’ve also updated my notebook from a 2015 Lenovo X250 to a 2020 Lenovo X13 with an AMD Ryzen 7 4750U CPU. On the old notebook, video encoding had a speedup of 2,6x, and encoding took over 18 minutes. On the new notebook with the Ryzen 7 CPU, encoding took only 6 minutes 19 seconds, a speedup of 7,6x over the original playback speed. The AMD CPU is still significantly slower than the H264 hardware encoder of the Nvidia graphics card in the workstation, but the speedup to last year’s notebook is equally significant. However, the X250 has only 2 cores and 4 threads while the new notebook has 8 cores and 16 threads. So from this point of view, single core performance did not improve much in 5 years. Also interesting: I recently bought another notebook with an 11th generation i5 processor (i5-1135G7) with 4 cores and 8 threads. On this machine, the ffmpeg task took 7:54 minutes, i.e. a speedup of 6x. A bit slower compared to the AMD Ryzen 7 with its 8 core but not that far away either.
Let’s stay on the CPU side of things for a bit longer: I also ran my ffmpeg task on a 10th generation i7 based notebook with an Nvidia MX250 graphics card. When running ffmpeg on the CPU, it runs a bit faster than on the notebook with the 11th generation i5 processor. No surprise here. I then tried to run my task on the MX250 GPU but couldn’t make it work. After a while I found out that not all Nvidia graphics cards seem to have H264 hardware encoding support and the entry level MX250 seems to be one of those lacking support.
So let’s quickly return for a minute to my AMD Ryzen 7 4750U based notebook: This processor has a GPU with hardware support for H264 encoding, so I gave this a try as well. Fortunately, in Ubuntu 20.04, ffmpeg comes with the necessary libraries out of the box, so I only had to modify the ffmpeg command line options to move the encoding process to the GPU. While GPU encoding significantly reduces the load on the CPU, the hardware isn’t quite as fast as the 8 CPU cores, and encoding took 10 minutes 43 seconds, which is a speedup of 4,5x. In other words, unless you need the CPU cores for something else while the process is running, it doesn’t make a lot of sense to run the H264 encoding process on the AMD GPU.
Ffmpeg in the Cloud
After I was a bit disappointed about the processing power in the cloud last year when rendering my videos, I gave it another try this year and compared the performance of 8 Intel Xeon Gold vCPUs and the performance of 8 AMD EPYC 7003 vCPUs in the Hetzner data center in Finland to my devices at home. The 8 Intel cores ran my ffmpeg task in 9 minutes 7 seconds, which is a speed-up of 5,24. In other words, that’s slower than on my AMD CPU in the notebook. The 8 AMD Epycs in another VM ran the task in 6 minutes 15 seconds, which is a speedup of 7,6x. This is exactly the same value as on my 8 core 16 thread AMD Ryzen 7 CPU in my notebook. One thing to note as far as the cloud Intel vs. AMD results are concerned: Both VMs had dedicated vCPUs and would cost around 85 euros per month. However, the Intel Xeons seem to be Skylake CPUs, i.e. they are a few years old already, while the AMD Epycs seem to be rather new hardware from a 2021 point of view. So be careful what you compare!
Ffmpeg on a Raspberry Pi 4
So let’s go to the low end of the computing power range I have at my disposal at home and have a look at how a Raspberry Pi 4 with its 4 CPUs fares with my ffmpeg task. The current version of Raspbian OS already includes an ffmpeg version with Broadcom GPU H264 hardware support, so I gave this a try first. I wasn’t sure what to expect but was still a bit disappointed with the 59 minutes 13 seconds it took to run my task, which is 0,8x the playback speed. Even worse: While the output file size looked ok, the encoding quality was horrible, lots of artifacts and pretty much unusable. So I ran ffmpeg on the 4 CPUs which resulted in an encoding time of 114 minutes and 16 seconds, which is 0,4x the playback speed of the resulting video. I checked the CPU clocks while running ffmpeg and throughout the process, all CPUs ran at 1.5 GHz and never throttled down due to the passive cooling without a fan. From a quality point of view, the CPU encoded video was on par with the results of the other machines.
ARM in the Cloud
Let’s return to the cloud again for a moment, because ARM does not mean slow. Amazon, for example offers ARM based VMs based on their Graviton 2 ARM processors. So I rented an 8 core ARM c6g.2xlarge instance for half an hour to see what the performance would be. The instance ran my ffmpeg task in 6 minutes and 38 seconds, so speedup was 7,8x, i.e. in the same league as my AMD notebook CPU and the 8 core AMD in the Hetzner cloud. And just for the fun of it, I rented an 8-core AMD Epyc 7002 instance (c5a.2xlarge) which resulted in a compute time of 6 minutes 38 seconds and a speedup of 7,2x, i.e. a bit slower than the 8-core ARM but again in the same league as my notebook and the corresponding Hetzner VM. Based on Amazon’s pricing table, both VMs would cost north of 500 euros a month.
ARM in the Notebook
So, what about Apple’s M1 ARM processor that are in current MacBook Pros? I asked a friend of mine to run my ffmpeg task with the same command line as used above on his M1 based MacBookpro 17.1 with 4 performance and 4 efficiency cores. The result: Encoding took 3 minutes 49 seconds. This translates into a speedup of 12.5x and totally beats all other CPUs I’ve tested, including the Ryzen 7, by a wide margin. I’m impressed!
Lots of numbers in this post, but I think they are well summarized in the graph above that shows the speedup numbers compared to the runtime of the resulting video.
What command did you run to get the best result for AMD on Hetzner?
On all CPUs I used the following command to make sure it’s comparable:
time ffmpeg -i in.mkv -ss 00:09:13 -to 00:56:59 -c:v libx264 -vf “fps=25″ -crf 20 out.mp4
On the Nvidia card, -c:v was set to h264_nvenc.
For the AMD GPU:
time ffmpeg -vaapi_device /dev/dri/renderD128 -i in.mkv” -ss 00:09:13 -to 00:56:59 -vf ‘format=nv12,hwupload,fps=25’ -c:v h264_vaapi out.mp4