RTX 5080 Benchmarks of Topaz Video AI 6.0.1

If you need to use TVAI this year, then I wouldn’t recommend buying a RTX 50X0 card for that program’s use. The reason is simple, it will be a massive undertaking for Topaz Labs to support that card, and I would predict they’ll not offer TensorRT versions of the model for that card in half a year to a year.

There are four reasons behind why:

  1. All Topaz products rely on super-ancient TensorRT versions. Those versions simply do not support the Blackwell architecture. Revamping all their software to a modern TensorRT version is a Massive undertaking.
  2. TensorRT support for Blackwell requires CUDA 12.8+. IIRC Topaz is still on CUDA 11.8, so again, a massive upgrade headache for them.
  3. nVidia still has driver issues to flesh out, so at this very moment the card is finicky. PyTorch isn’t even running on CUDA 12.8, and that’s what most everyone is using to build and manage their models (Including Topaz I assume).
  4. Without TensorRT support, the additional ~30% capacity of the 50-cards are wasted. The massive bandwidth can make up for some of it, but not all.

Here’s a benchmark result for an under-clocked 4090 card running on a ryzen 7950x with so-so memory speed.

Topaz Video AI  v4.2.2
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 7950X 16-Core Processor              127.74 GB
GPU: NVIDIA GeForce RTX 4090  23.576 GB
Processing Settings
device: 0 vram: 1 instances: 0
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	40.73 fps 	2X: 	16.53 fps 	4X: 	04.71 fps 	
Iris		1X: 	40.71 fps 	2X: 	17.32 fps 	4X: 	05.03 fps 	
Proteus		1X: 	42.85 fps 	2X: 	17.49 fps 	4X: 	05.17 fps 	
Gaia		1X: 	15.88 fps 	2X: 	11.10 fps 	4X: 	04.55 fps 	
Nyx		    1X: 	18.56 fps 	2X: 	15.37 fps 	
Nyx Fast	1X: 	34.41 fps 	
4X Slowmo	Apollo: 37.88 fps 	APFast: 76.84 fps 	Chronos: 33.43 fps 	CHFast: 34.43 fps 	
16X Slowmo	Aion: 	32.07 fps 	

PS. that this is for TVAI 4 doesn’t matter. It’s the same code that runs these benchmarks across versions (ONNX runtime / Cuda TensorRT, which doesn’t change with TVAI versions) and the bottleneck is data shuffling between DRAM, CPU and over the PCIe link to the GPU and within the cache hierarchies on the GPU, not the TVAI software versions. And ONNX does more data shuffling than TensorRT so that’s partly why it’s slower