It seems TVAI is CPU bottlenecked when it shouldn’t, on Linux.
I’ve tested using both FP16 and TensorRT models on 3.4.4.0.b using a Geforce RTX 3090 card and a 16 core (32 thread) Ryzen 5950X CPU. Same overall result for both model types; ~30% GPU utilization.
The following command was used for testing:
ffmpeg -f lavfi -i testsrc=duration=60:size=640x480:rate=30 -pix_fmt yuv420p \
-filter_complex "tvai_up=model=prob-3:scale=2:preblur=-0.6:noise=0:details=1:halo=0.03:blur=1:compression=0:estimate=20:blend=0.8:device=0:vram=1:instances=1" \
-f null -
This indicates the pipeline is CPU bottlenecked, which makes no sense. The lavfi filter is able to produce > 2600 fps using only 6.4 cores. For the 18 FPS obtained in this pipeline, this translates to 0.044 (1/23) core of CPU consumption to produce the test frames., That leaves the TVAI engine accounting for the entire CPU resource consumption shown above.
Topaz:
- Do you have an explanation for what in the VAI pipeline requires so much CPU?
- Any suggestion on how to improve the GPU utilization?
- Anything I can do to help you nail down and remove the bottleneck?
To reproduce:
git clone https://github.com/jojje/vai-docker.git
make build benchmark
PS. You need to have Nvidia’s container-toolkit installed so that the host GPU gets exposed to the container.