Video encoder comparison

I have been a video encoder enthusiast since the early 2000s, but I have not been following it closely for several years. I was curious about the current state of video encoders. As a long time enthusiast I have long been curious to see how modern codecs compare to historic codecs.

I picked a high quality, lowish complexity, 1000 frame 1080p source clip. I primarily used ffmpeg as the encoder. I encoded this clip about 2000 times using various codecs and settings as supported by ffmpeg. I sought to use fixed quality encoding settings, and avoid bit-rate based encodes.

Tested codecs

The list of codecs I encoded with is roughly ordered from modern to older codecs

RAV1E – an AV1 codec developed by Xiph.
SVT-AV1 – an AV1 codec developed by Intel and Netflix.
VP9 – a codec that was developed by Google, after acquiring On2
x265 – an open source software implementation of the H265 codec.
videotoolbox HVEC – a hardware implementation of the H265 codec using the Apple M1-Pro CPU (in my case).
Intel QSV H265 – a hardware implementation of the H265 codec using Intel processors.
VP8 – a codec that was developed by On2, that was freely released by Google after acquiring On2.
x264 – an open source software implementation of the H264 codec.
videotoolbox H264 – a hardware implementation of the H264 codec using the Apple M1-Pro CPU.
Intel QSV H264 – a hardware implementation of the H264 codec using Intel processors.
Theora – an open source codec from Xiph based on the VP3 codec developed by On2.
libxvid – an open source software implementation of the MPEG-4 codec
mpeg4 – ffmpeg’s implementation of the MPEG-4 codec.
MPEG2 – ffmpeg’s implementation of the MPEG-2 codec.
Cinepak – Cinepak is a codec that was developed in 1991. I encoded videos with ffmpeg’s implementation of the Cinepak codec.
MPEG1 – ffmpeg’s implementation of the MPEG-1 codec.

Methodology

I did most encodes with ffmpeg on my M1 Pro MacBook Pro. For each ffmpeg encode, I did the following:

Ffmpeg encoded the video at a variety of preset at a variety of quality settings. I sought for ffmpeg to output fixed quality settings, and not be bitrate constrained.
The CPU time taken for each ffmpeg was tracked with the command line “time” command
To determine the quality, I used ffmpeg to calculate the VMAF of each encode with this command:
ffmpeg -i \(filename).mkv -i source.mp4 -lavfi libvmaf=\"model_path=/opt/homebrew/Cellar/libvmaf/2.3.1/share/libvmaf/model/vmaf_v0.6.1.json \":log_path=\(filename)_vmaf.txt -f null -
I took the harmonic mean of the VMAF. I selected VMAF as it appears to be one of the more preferred algorithmic quality comparison metric.

I also stored ffmpeg’s vstats for each encode, and had ffmpeg calculate the ssim for each encode, but I have not used this data in this analysis.

For some codecs, I had to use handbrake rather than ffmpeg. I used the same command to generate the VMAF, but I was unable to keep track of the encode time.

I produced several graphs comparing the size/quality of the encodes at different quality settings and presets, which are shown below.

To be able to effectively compare speed and size of codecs, I sought to produce a graph in which the quality is fixed. For this, I identified the encode for each codec/preset which produced an output with a VMAF closest to 90. For some codecs, I had to do more granular encodes to find an encode that had a VMAF of 90.

Limitations

The purpose of this test was to identify broad trends from the encoders. Encoders perform differently on different types of video, so the findings are not universal. The margin for error is probably at least 5%, and it is folly to interpret the numbers beyond that. The reasons for this limitation include:

The performance of video encoders can vary significantly depending on the source material. One encoder may perform better than another encoder on encoding one video, but worse when encoding a different video. Despite that, I figure this test should reveal general comparisons between codecs should be largely valid.
I understand VMAF is a popular algorithmic video encoder comparison system, but all algorithmic systems are flawed, and cannot completely replicate the human visual system. I do not have the resources or inclination to do extensive human testing of comparative video quality.
I sought to accurately measure the time it took to encode video, using the commandline “time” command to calculate the total CPU time taken for each encode. But I did not make every effort to ensure every encode was in an identical environment. It took several weeks to do all of the different encodes, and I did not ensure I did not use the computer while it was encoding. Further, when a codec was particularly badly threaded, I generally ran multiple encodes, to reduce the total time. I have not investigated the limitations of the “time” command, but I expect it is imperfect. Among other potential issues:
- A poorly threaded and a well threaded encoder may have taken the same amount of total CPU time, despite the poorly threaded encode taking 5 times longer in the real world. I do seek to mention when it appears an encoder is not well threaded;
- I expect the “time” command does not take into account the work done in hardware encoding, potentially under-stating this work.
I encoded this on an M1 Pro Macbook Pro. Different CPU architectures will have codecs with different optimisations, and will perform differently. I am curious, but largely unable to test comparative capacities of hardware encoders.

Quality – size graphs

Now to get into fun stuff. I have prepared several graphs showing the comparative codec performance. The vertical axis shows the quality of an encode by VMAF, and the horizontal axis is logarithmic scale of the file size of that encode in bytes. The axis of each graph are adjusted to highlight the differences between the encodes. Almost every graph includes the “x264 medium” encode as a standard scale. Finally, these graphs do not show how fast the encode took place, but the speed of the encoders is engaged with later.

Graph showing the SVTAV1, RAV1E, VP9, x265, x264, HVEC-10, Theora, Xvid, MPEG 1 and Cinepak codecs — Representative selection of codecs

This first graph shows a selection of codecs and presets. Not all codecs and presets are included, to avoid cluttering up the graph.

This graph shows AV1 codecs, at various settings. While RAV1E blows earlier codecs out of the water, SVTAV1 is generally superior. What is not shown in this graph is that RAV1E is much worse threaded than SVTAV1, which means that it encodes considerably slower than SVTAV1.

The following are example ffmpeg commands I used to encode the video

ffmpeg -i source.mp4 -vcodec librav1e -speed 4 -qp 94 rav1e_Q94_S4.mkv

ffmpeg -i source.mp4 -vcodec libsvtav1 -preset 12 -qp 29 svtav1_QP_29_S12.mkv

Prior to conducting this test, x265 was my go-to codec when encoding video where quality and size are important. I was a bit surprised that x265 “ultrafast” and “superfast” is so much better quality than x264, and not that much worse than the slower presets. I used ffmpeg commands similar to the following to encode the x265 video:

ffmpeg -i source.mp4 -vcodec libx265 -preset fast -qp 10 x265_10_fast.mkv

All the x264 presets (other than “ultrafast) seemingly produce fairly similar quality output. The “fast” preset seemingly produces essentially as good quality video as “placebo’, which is orders of magnitude slower. I used ffmpeg commands similar to the following to encode the x264 video:

ffmpeg -i source.mp4 -vcodec libx264 -preset fast -qp 10 x264_10_fast.mkv

I have access to Intel hardware encoder (QSV), and Apple’s “Video Toolbox” (or VTB), each in H264 and H265/HEVC flavours. While ffmpeg does support encoding VTB, my testing found that handbrake seemingly has access to better quality hardware encoding options for H265/HEVC than ffmpeg exposes. Additionally, I did not install ffmpeg on my intel machine, and relied on handbrake for QSV encodes as well. As a result, most of the encodes were done with handbrake instead of ffmpeg.

Of interest, other than the H265/HEVC encodes done with handbrake on the Apple M1 Pro, all of the encodes (H264 and H265/HEVC) were of lower quality than the x264 encode. The better encoding options exposed by Handbrake for VTB are two “presets” “fast” and “quality”. Both of which output significantly better quality than the ffmpeg VTB codec. Additionally, Handbrake also allows you to encode VTB video in “10 bit”. Despite the source video not being 10 bit, The graphs show that “10 bit” video encode on each of those presets are materially better than the regular version. I have no idea why this would result in a better quality encode, or why ffmpeg does not expose these higher quality options when encoding in VTB. More generally, the output from handbrake shows that the VTB hevc encode is capable of encoding video that is substantially better than x264, and not far off the quality of x265.

Please let me know if you know how to get ffmpeg to use a better quality VTB encode. The ffmpeg command I used for the encoding was:

ffmpeg -i source.mp4 -vcodec hevc_videotoolbox -q:v 65 vtb_hevc_65.mkv

We finally come to the grab-bag of codecs being:

Cinepak;
MPEG 1;
MPEG 2;
MPEG 4 / xvid;
Theora;
VP8; and
VP9

Cinepak is the clear loser, being about an order of magnitude larger than the next codec for the same quality.

ffmpeg -i source.mp4 -vcodec cinepak -g 100 -q:v 1 cinepak_1.mkv

I was surprised by MPEG1, MPEG 2, MPEG 4 and xvid all being about the same quality. This was unexpected, and different to my understanding of their comparative quality. I look at this a little closer in the next graph, below.

Theora puts on a decent performance, appearing to be better quality than MPEG-4 and xvid at most quality levels. I was a fan of the VP3 codec back in the day, so Theora’s performance warms my heart, although note my comments about MPEG-4 and xvid below.

ffmpeg -i source.mp4 -vcodec libtheora -q:v 1 theora_1.mkv

VP8 was strange to encode. I sought to have ffmpeg use pure VBR encoding, using the -b:v 0 flag, however ffmpeg seemed to ignore this, and tried to get the smallest file size, no matter which quality setting I asked it to do. I tried again, instead specifying an arbitrarily large bitrate, and it produced different sized and quality videos based on the requested quality. However it appears that ffmpeg is still not encoding VP8 properly. At the lower quality settings, it becomes lower quality and larger than Theora, which based on the older VP3 codec. Additionally, VP8 is generally considered to be of equivalent quality to H264, but this graph shows it to be significantly worse than x264. Finally, I tried to use the “realtime” preset, but it was all over the place, with file size and quality jumping all over the place, without much rhyme or reason.

ffmpeg -i source.mp4 -c:v vp8 -crf 42 -quality good -b:v 10M vp8_good_42.mkv

VP9 performed well, significantly out performing the x264 encode, and all other codecs in this graph. It is a solid codec.

ffmpeg -i source.mp4 -c:v libvpx-vp9 -crf 12 -quality good -b:v 0 vpx-vp9_good_12.mkv

As promised, this graph shows the performance of the MPEG 1, MPEG 2, xvid and ffmpeg’s native mpeg4 codec. I am perplexed because xvid and the mpeg4 codecs are generally considered to be significantly better than MPEG-1 and MPEG-2, however this test shows them to be very similar in quality. The mpeg4 codec also bounces around, with some encodes being larger, and worse quality than others, at otherwise the same settings. There are few xvid/mpeg4 specific settings. I tried some encodes with b-frames enabled, but it did not result in a significantly different encode quality/size.

ffmpeg -i source.mp4 -vcodec mpeg1video -qscale:v 2 mpeg1_2.mkv

ffmpeg -i source.mp4 -vcodec mpeg2video -qscale:v 2 mpeg2_2.mkv

ffmpeg -i source.mp4 -c:v libxvid -qscale:v 3 libxvid_3.mkv

ffmpeg -i source.mp4 -c:v mpeg4 -qscale:v 3 mpeg4_3.mkv

Speed / Size graphs

I prepared one final graph, which ostensibly compares the encoding speed of the different codecs, by how large it encodes a video at a specific quality. I took the encode of each codec/preset that produced a video closest to a VMAF of 90, and compared the resulting size and encode time:

Comparison of codecs/presets with a VMAF of about 90

I was only able to compare the codecs I encoded with ffmpeg, and measured the encode time. Some of the hardware encoders are not included. The graph is also a bit unreliable as it is not possible to specify a VMAF rating for an encode. Although I identified the encode that that had a VMAF closest to 90, the file size difference between an encode with VMAF of 90.8 and 90.1 may be significant, and falsely make it appear that an encode with a slower preset produced a larger, slower encode. That said, where there was a discrepancy like that, where I could, I sought to do more granular encodes to smooth out the bumps.

Each axis has a logarithmic scale. As you go up, encodes become much, much slower, and as you go right, encodes become much, much bigger. The encode time measures total time used per core. In other words, a perfectly threaded encode that takes “300” seconds will take 30 seconds on the 10 core M1 Pro.

I am honestly fascinated by this graph. The vertical wall that x264 hits once it reaches the x264 fast preset is fairly apparent. x264 “placebo” is orders of magnitude slower, but does not produce appreciably smaller file size for the same quality. While x265 does not hit quite the same kind of wall as x264, with each preset being noticeably smaller than the previous one, it takes longer and longer. The AV1 encoders surprised me. SVTAV1’s fastest preset was as fast as the “ultrafast” x265 preset, while being significantly smaller. This trend continued with the SVTAV1 encodes being significantly smaller at each speed increment. RAV1E was not as impressive as SVTAV1, but still significantly outperformed x265 at a given encode time.

Conclusions

As just hinted, I am very impressed with how much better quality AV1 is, at a similar speed to x265. From a casual encoder perspective, I consider AV1 is ready for use. The Rav1e codec is fine, but STVAV1 is generally faster and better quality. It is also better threaded. I did the encodes in August 2023. Since then, both AV1 codecs have been updated improving the quality and encode times.
On the other end, unless you have a particular need to encode H264, today it is redundant. X265, and even STVAV1 is not much slower than x264, and produces significantly better output for the size.

Things I kinda knew, and this confirmed:

hardware encoders are significantly worse than a decent software encoder. A fast version of a software encoder can produce video almost as fast, as a hardware encoder, and look better (although it will almost certainly use more power)
Cinepak is a joke… I mean, of course it is, it is 40 years old, and optimised for reduced playback complexity. And I doubt there is any interest in optimising the encoder. But it is still astounding that it is not only one of the slowest encoders, it also orders of magnitude larger than other encodes.

Things I probably should have known, but didn’t

many codecs reach a point where presets taking significantly more time does not result in a materially better encode. Eg x264 “fast” is almost as good as H264 “placebo” despite being 20x faster.
As a corollary to the above, while more modern codecs generally take longer to encode, the faster presets of newer encoders will produce better output, quicker than slower presets of older codecs. eg x265 “ultra fast” is quicker and produces better video than x264 “very slow”. SVT-AV1 at preset “12” is also faster and far better than x264 “very slow”.
The speed of an may can vary significantly depends on the quality of the encode, not just the preset. For example, a x264 “medium” encode at a q of 10 will take 4x as long as a x264 “medium” encode with a q of 45.

Issues with ffmpeg

I have long been a fan ffmpeg, and for various reasons I focused on encoding with ffmpeg. However, I encountered unexpected issues with ffmpeg.

VP8, xvid, and ffmpeg’s own mpeg4 codec produced much worse output than I expected. It is possible that I could have achieved better results with using specialised settings, but it was not clear how I could do this. In any event, I was surprised that ffmpeg’s default encoding mode for each of these codecs produce unexpectedly bad output.

Ffmpeg’s vtbhevc support is also disappointing. While undoubtedly far less used than xvid and VP8, it is unclear why the ffmpeg is outputting so much worse video with vtbhevc than handbrake given it is a hardware codec, and running on the same hardware.

I searched the internets for answers to these questions, but found no explanation.

Next steps?

If I am able to address the issues I had with ffmpeg, I would like to try to redo those encodes.

While of no practical use to people today, I am curious about the quality of codecs used in the distant past, and how they compare to modern codecs. I have no easy way of doing encoding video with these older codecs, and I will have to rely on older hardware or emulation.

I am a bit out of touch of the modern codec world, and if there are other modern codecs I have overlooked, that can be relatively easily tested, I would be keen to see how they compare.