A reddit post after my last post suggested that despite the source being in 8 bit colour, SVTAV1 should always be encoded in 10 bit colour, as it produces significantly better results. ffmpeg can be asked to encode in a different colourspace by specifying the pixel format. For example, to encode a video in 10 bit colour, add “-pix_fmt yuv420p10le”.
SVTAV1 presets indeed produces significantly better quality video quality for the size when encoding in 10 bit colour, but at cost of encoding much slower.
For short-hand in this post I will generally describe a preset as encoding a better quality than another preset if it is smaller and better quality, or in terms of the graphs, if the line of one preset is to the left of another preset.
This graph depicts the 8 bit encodes in shades of green, and the 10 bit encodes in shades of red. With all of the overlapping lines, this graph is fairly inscrutable, so I include the graph below with a third as many encodes.
As encoding SVTAV1 in 10-bit is so much slower, I did not encode with a preset slower than S3. However, the S3-10 bit encode was generally better quality than the S1 8-bit encode. At higher VMAF scores, the S6 10-bit encode was a similar quality to the S1 8-bit encode.
So, if the trade-off is speed, how much slower is SVTAV1 when encoding in 10bit? There is significant variation by preset, as shown in the table below (due to the vagaries of timing, I rounded the speed differences to the nearest “20%”).
Preset | Speed difference (approx) |
S3 | 120% slower |
S4 | 140% slower |
S5 | 60% slower |
S6 | 60% slower |
S7 | 80% slower |
S8 | 100% slower |
S9 | 120% slower |
S10 | 40% slower |
S11 | 40% slower |
S12 | 40% slower |
The effect of this slowdown that 10-bit S3 encodes slower than 8-bit S1, and 10-bit S9 encodes slower than 8-bit S6 encodes. But given the 10-bit S3 encode of video is significantly better quality than the 8-bit S1 encode, this will often be an appropriate trade-off. 10-bit S6 encodes only a little bit slower than 8-bit S5, and 300% faster than 8-bit S1, but for encodes with VMAF above about 90, produces similar encodes to 8-bit S1.
Final comments
12 bit video
I tried to encode the video in 12 bit (“-pix_fmt yuv420p12le”) to see if it made any difference. ffmpeg reported that this pixel format was not supported by STVAV1, and encoded it in 10 bit (outputting identical file size and VMAF quality to the other 10 bit encodes).
Graphs
Producing these graphs in excel are the bane of these posts. Making consistent colours is frustratingly manual and time consuming. I have looked at a few alternatives, but did not find anything better.
Harmonic mean for VMAF
As mentioned in an earlier post, I have been using the harmonic mean to determine VMAF scores of video. I was asked on Mastodon about this choice on the basis that regular mean is more common. This is a fair criticism. Harmonic mean is not really intended for this kind of use.
When doing this post, I compared the harmonic and regular mean for the 10-bit encodes in this post, and they were within 0.2% of each other for all encodes until the VMAF score dropped below about 80. It turns out that where the numbers that are being averaged are very close, as when the video is being encoded with a fixed quality, the difference between harmonic mean and regular mean is negligible.
It could be argued that harmonic mean of VMAF gives a useful result as video with highly variable quality will score significantly lower. I will not pretend this is why I chose harmonic mean.
I did keep using the harmonic mean for this analysis as I was comparing it to encode data I already had, and I did not want to extract a different mean from the VMAF XML files. In the future, unless I return to this well of data, I will use the regular mean.