Allow VAI filters to use RGB32 rather than RGB48 - enabling much lower CPU usage

Sorry - this is going to be technical!.. :wink:
Currently the VAI filters require RGB48 inputs (so FFMPEG auto converts them) - but the conversion hasn’t been accelerated using ASM within FFMPEG - so it uses the standard C implementation - which is much slower. I’ve attempted to code an ASM converter myself for RGB48 but the architecture of swscale ffmpeg module (based on the VideoLan implementation) isn’t really suitable without a rewrite of the ASM libraries to support 16 bit output.
If the input of the filters supported RGB32 then we would get accelerated RGB conversion out-of-the-box using the existing FFMPEG code.

TLDR - The colour conversion processes in ffmpeg is slow due to the use of RGB48 in VAI. If RGB32 is used then we’d get a large decrease in CPU usage.

Interesting! Tagging @suraj to weigh in on this one.

4 Likes

This is an interesting request. Currently we just set the filter input to be RGB48 allowing 16bit per channel. Using RGB32 will restrict it to 8bit per channel. In most cases 8bit per channel should be fine but for high quality inputs this will cause a problem that needs to be handled.
Internally we do convert the RGB48 to RGB 32F for all the processing so the CPU usage will still be high due to the conversion.
Due to the second conversion last time I checked the performance difference between RGB48 and RGB32 was negligible.

Since you are so knowledgeable you can look through the filter code here, let me know if you find more things that can be optimized.

3 Likes

Thank you for explaining the technical constraints and the rationale behind using RGB48. I understand the concerns about reducing the quality to 8bit per channel with RGB32 and the negligible performance difference you’ve observed.

However, I wonder if there might be scenarios where the conversion to RGB32 could still provide some advantages, particularly for those not requiring high-quality 16bit channels (particularly as you seem to be converting to RGBF32 anyway). Maybe a filter parameter could be used if there is a concern about a ā€˜colour downgrade’ from the existing configuration (defaulting RGB48)?

I’ve just ran a quick benchmark, taking VAI out of the equation completely - using ffmpeg to convert a sample video from YUV420 to both RGB32 and RGB48. I’ve taken some steps to try to remove any other factors (it’s not outputting to a file to prevent any I/O factors, nor is it using any compression, for example) - as you can see, RGB32 is about 25% faster than RGB48 - due to the use of ASM code in FFMPEG…

ffmpeg-gcc12-20230717.exe -i d:\input.mp4 -t 00:10:00 -pix_fmt rgb48 -f rawvideo -vcodec rawvideo -an NUL
frame=15000 fps=861 q=-0.0 Lsize=54801562kB time=00:09:59.96 bitrate=748273.9kbits/s speed=34.4x
ffmpeg-gcc12-20230717.exe -i d:\input.mp4 -t 00:10:00 -pix_fmt rgb32 -f rawvideo -vcodec rawvideo -an NUL
frame=15000 fps=1069 q=-0.0 Lsize=36534375kB time=00:09:59.96 bitrate=498849.3kbits/s speed=42.8x

So if the VAI filter took RGB32 as an input, then we’d get either faster processing time as the CPU cycles can be used for VAI, or lower power usage if the CUDA sub-system is the limiting factor as the CPU needs to do less work.

Something to mention here is that I’m running on a Intel CPU with SSSE3 (this is what provides the ASM performance gains) - and my ffmpeg build config isn’t disabling ASM.

Just wanted to make clear that I’m not an expert on any of this - just a geek with some time on my hands.

My benchmark suggests that accepting RGB32 as an input might offer a tangible benefit. While I recognise the limitations you’ve outlined, I think it could be worth exploring this further, especially if it leads to efficiency gains. If I’ve missed something or if there’s potential for improvement or collaboration, please let me know. Looking forward to hearing your thoughts.

1 Like

Sure, we can change this for 8bit or lower inputs. Even if there is no performance gain, it will get rid of the annoying ā€œno accelerated ā€¦ā€ warnings.
This will most likely be part of a TVAI alpha first. So I would recommend you join the beta group @david.smith-6070

4 Likes

Thanks Suraj - request made to join the beta group.

When using CLI, I drop a format=pix_fmts=ā€˜yuv444p’ before the TVAI filter. It doesn’t seem to negatively impact speed for me, and it does avoid the annoying [warning] message from the auto-inserted swscale filter.

Furthermore, the accuracy of color is slightly improved. The following testcase command creates an artificial frame of YUV=32,128,128 in YUV420p and the output is YUV=30,128,128 in YUV420p.

$ ffmpeg-topaz -hide_banner -f 'lavfi' -i nullsrc=size='ntsc':rate='ntsc',format=pix_fmts='yuv420p',trim=start_frame=0:end_frame=1,geq=lum_expr=32:cb_expr=128:cr_expr=128 -vf showinfo,tvai_up=model=iris-1,format=pix_fmts='yuv420p',signalstats,metadata=mode='print' -f 'null' -

[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.YMIN=30
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.YLOW=30
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.YAVG=30.6235
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.YHIGH=31
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.YMAX=31
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.UMIN=127
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.ULOW=128
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.UAVG=127.987
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.UHIGH=128
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.UMAX=129
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.VMIN=127
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.VLOW=128
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.VAVG=127.992
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.VHIGH=128
[Parsed_metadata_4 @ 0x125e0cd50] lavfi.signalstats.VMAX=129

With a preceding format=pix_fmts='yuv444p' before TVAI, the [warning] is gone and the final YUV420p output is closer to the original of YUV=32,128,128.

Two points of Y in (235-16=219) is just short of 1% improved color accuracy.

$ ffmpeg-topaz -hide_banner -f 'lavfi' -i nullsrc=size='ntsc':rate='ntsc',format=pix_fmts='yuv420p',trim=start_frame=0:end_frame=1,geq=lum_expr=32:cb_expr=128:cr_expr=128 -vf showinfo,format=pix_fmts='yuv444p',tvai_up=model=iris-1,format=pix_fmts='yuv420p',signalstats,metadata=mode='print' -f 'null' -

[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.YMIN=31
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.YLOW=32
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.YAVG=32.0031
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.YHIGH=32
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.YMAX=33
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.UMIN=127
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.ULOW=128
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.UAVG=127.988
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.UHIGH=128
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.UMAX=129
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.VMIN=127
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.VLOW=128
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.VAVG=127.991
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.VHIGH=128
[Parsed_metadata_5 @ 0x127606f40] lavfi.signalstats.VMAX=129

I don’t yet see a downside to scaling the chroma planes to full resolution from YUV420p>YUV444p before TVAI, at least when dealing with 8 bit YUV420p. It clears the warning and the data from the artificial test case suggests the color is more accurate.

Hi Suraj - I know it’s only been a month and I can see lot’s of great activity in the beta products (which I won’t name here!), but it there any news on this request? Will it make it into a beta release?

We will get to it soon.

2 Likes