If you need to use TVAI this year, then I wouldn’t recommend buying a RTX 50X0 card for that program’s use. The reason is simple, it will be a massive undertaking for Topaz Labs to support that card, and I would predict they’ll not offer TensorRT versions of the model for that card in half a year to a year.
There are four reasons behind why:
- All Topaz products rely on super-ancient TensorRT versions. Those versions simply do not support the Blackwell architecture. Revamping all their software to a modern TensorRT version is a Massive undertaking.
- TensorRT support for Blackwell requires CUDA 12.8+. IIRC Topaz is still on CUDA 11.8, so again, a massive upgrade headache for them.
- nVidia still has driver issues to flesh out, so at this very moment the card is finicky. PyTorch isn’t even running on CUDA 12.8, and that’s what most everyone is using to build and manage their models (Including Topaz I assume).
- Without TensorRT support, the additional ~30% capacity of the 50-cards are wasted. The massive bandwidth can make up for some of it, but not all.
Here’s a benchmark result for an under-clocked 4090 card running on a ryzen 7950x with so-so memory speed.
Topaz Video AI v4.2.2
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 7950X 16-Core Processor 127.74 GB
GPU: NVIDIA GeForce RTX 4090 23.576 GB
Processing Settings
device: 0 vram: 1 instances: 0
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 40.73 fps 2X: 16.53 fps 4X: 04.71 fps
Iris 1X: 40.71 fps 2X: 17.32 fps 4X: 05.03 fps
Proteus 1X: 42.85 fps 2X: 17.49 fps 4X: 05.17 fps
Gaia 1X: 15.88 fps 2X: 11.10 fps 4X: 04.55 fps
Nyx 1X: 18.56 fps 2X: 15.37 fps
Nyx Fast 1X: 34.41 fps
4X Slowmo Apollo: 37.88 fps APFast: 76.84 fps Chronos: 33.43 fps CHFast: 34.43 fps
16X Slowmo Aion: 32.07 fps
PS. that this is for TVAI 4 doesn’t matter. It’s the same code that runs these benchmarks across versions (ONNX runtime / Cuda TensorRT, which doesn’t change with TVAI versions) and the bottleneck is data shuffling between DRAM, CPU and over the PCIe link to the GPU and within the cache hierarchies on the GPU, not the TVAI software versions. And ONNX does more data shuffling than TensorRT so that’s partly why it’s slower