Workstation Power at Home – Part 3 – FFmpeg on the GPU

Image: H.264 encoding on the GPU

In the previous post I ran FFmpeg and Handbrake on the 6 core CPU of my workstation and got a good but still modest 2.5x speedup of the video encoding task compared to running the same operation on notebook. I would have expected at least a 5x speedup and I’m still puzzled why I didn’t get there. But I have moved on for the moment and have taken a closer look if I could make FFMpeg and Handbrake use the H.264 hardware encoder on the Nvidia Quadra M2000 GPU instead of running this tasks on the CPUs.

After a bit of searching I found out that Nvidia H.264 hardware encoding was only recently added to Handbrake and the latest version which has not yet made it into the standard Ubuntu 20.04 repository. So I installed the current software version of Handbrake via the project’s PPA and tried again. Here’s the command line that pushes the re-encoding of an mp4 file into a low-quality mp4 H.264 stream:

HandBrakeCLI -i x-original.mp4 -o x--original-q35.mp4 --all-subtitles --all-audio -e nvenc_h264 -q 35 -B 160

Instead of using x264 as target codec, Handbrake pushes the encoding task to the GPU if nvenc_h264 (Nvidia Encoding H264) is specified on the command line. And the difference to CPU encoding is indeed significant. Re-encoding a 1 GB H.264 file in lower quality to achieve a smaller file size takes 29 minutes on my X250 notebook, 12 minutes and 50 seconds when using the Z440 workstations 6 Xeon E5-1650 v4 cores and only 4 minutes and 50 seconds when offloading the H.264 encoding task to the GPU. That’s a 2.6x speedup over CPU encoding and a 6x speedup running the operation on my notebook (and doing nothing else in the meantime). Finally we are getting somewhere! While all CPU cores are fully loaded when running the operation only on the CPUs, CPU load is around 50% when using the GPU for encoding. This is because de-coding the original H.264 file continues to be done on the CPU.

The screenshot at the beginning of the post shows some statistics provided by the Nvidia tool. Video engine utilization jumps from 0% to 97% when the operation is started and PCIe bandwidth utilization goes from 0 to 6%.

Re-encoding an ISO file that was previously created from a DVD with Brasero into a high quality H.264 mp4 stream delivers even higher gains. Running this operation on my notebook takes around 31 minutes for an 8 GB ISO file that contains around 3 hours of video. The same job on the HP Z440 workstation and the NVidia GPU takes 3 minutes 55 seconds, an 8x gain!

And one more advantage of the workstation over the notebook setup: While the notebook has to crank-up its fan to get the heat out, the Z440’s CPU and GPU fans have not trouble getting the heat out and do not even bother to increase their speed and sound level. All very silent despite all the work going on!

And for completeness sake, here’s the output of Nvidia’s command line tool during the operation:

martin@Z440:~$ nvidia-smi -q -d UTILIZATION

==============NVSMI LOG==============

Driver Version : 450.80.02
CUDA Version : 11.0

Attached GPUs : 1
GPU 00000000:02:00.0
Utilization
Gpu : 10 %
Memory : 15 %
Encoder : 94 %
Decoder : 0 %
GPU Utilization Samples
Duration : 16.37 sec
Number of Samples : 99
Max : 10 %
Min : 8 %
Avg : 8 %
Memory Utilization Samples
Duration : 16.37 sec
Number of Samples : 99
Max : 15 %
Min : 12 %
Avg : 13 %
ENC Utilization Samples
Duration : 16.37 sec
Number of Samples : 99
Max : 94 %
Min : 86 %
Avg : 91 %
DEC Utilization Samples
Duration : 16.37 sec
Number of Samples : 99
Max : 0 %
Min : 0 %
Avg : 0 %