Topaz Video AI Alpha 6.1.1.1.a.trt (RTX 5000 Optimization)

Hello everyone,

Our second pre-release for today is an alpha with support for optimized models running on RTX 5000 series GPUs from NVIDIA.

For this first alpha we’d like to point out that RTX 5000 series performance is significantly improved over the current main channel release, but we are still seeing intermittent performance loss with older GPUs.

We appreciate any testing (RTX 5000 or any other GPU), but just know that performance may be reduced compared to main 6.1

Thanks as always!

6.1.1.1.a.trt - Windows

4 Likes

nice, i got decent uplift in Artemis, Iris and Proteus 1X performance from 5090 over 4090. Although I see some 10% run to run differences in 1X results (not really in 2X/4X). Maybe something to improve in the benchmark.
2x and 4x seem to be limited by RAM/CPU speed.
Nyx fast, Chronos/Chronos Fast and Aion also see good uplift.
Most other models seem to be about the same.
Gains seem to be in-line with the general performance difference of 20-30% from 4090 to 5090.

Topaz Video AI Alpha  v6.1.1.1.a.trt
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 7950X 16-Core Processor              31.71 GB
GPU: NVIDIA GeForce RTX 5090  31.349 GB
Processing Settings
device: 0 vram: 1 instances: 0
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	50.65 fps 	2X: 	21.41 fps 	4X: 	05.78 fps 	
Iris		1X: 	51.28 fps 	2X: 	21.92 fps 	4X: 	06.05 fps 	
Proteus		1X: 	52.88 fps 	2X: 	22.94 fps 	4X: 	06.12 fps 	
Gaia		1X: 	18.89 fps 	2X: 	13.77 fps 	4X: 	05.70 fps 	
Nyx		1X: 	18.29 fps 	2X: 	12.39 fps 	
Nyx Fast		1X: 	43.99 fps 	
Rhea		4X: 	05.78 fps 	
RXL		4X: 	05.76 fps 	
Hyperion HDR		1X: 	45.23 fps 	
4X Slowmo		Apollo: 	48.47 fps 	APFast: 	76.59 fps 	Chronos: 	46.67 fps 	CHFast: 	50.36 fps 	
16X Slowmo		Aion: 	43.08 fps 	

4090 on same system (with older TVAI 5.3 but that doesnt really matter from my experience):

Topaz Video AI Alpha  v5.3.0.1.a.hyp.rxl
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 7950X 16-Core Processor              31.118 GB
GPU: NVIDIA GeForce RTX 4090  23.576 GB
GPU: AMD Radeon(TM) Graphics  0.47445 GB
Processing Settings
device: 0 vram: 1 instances: 0
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	44.05 fps 	2X: 	18.96 fps 	4X: 	05.19 fps 	
Iris		1X: 	42.82 fps 	2X: 	22.31 fps 	4X: 	05.99 fps 	
Proteus		1X: 	41.98 fps 	2X: 	22.13 fps 	4X: 	06.17 fps 	
Gaia		1X: 	15.66 fps 	2X: 	10.96 fps 	4X: 	05.60 fps 	
Nyx		1X: 	17.64 fps 	2X: 	15.97 fps 	
Nyx Fast		1X: 	31.61 fps 	
Rhea		4X: 	05.50 fps 	
4X Slowmo		Apollo: 	44.50 fps 	APFast: 	75.33 fps 	Chronos: 	33.24 fps 	CHFast: 	39.70 fps 	
16X Slowmo		Aion: 	37.31 fps

Here is an AI generated comparison chart of the results:

For fun, I told the AI to get some ‘inspiration’ from Topaz website for the color scheme:

2 Likes

How the heck did you (the team) manage this feat tony?
I see you’re still using tensorrt: 10.8.0.99, but apparently the blackwell perf has improved :slight_smile:

Color me impressed.

My only theory is that you’re leveraging the massive bandwidth advantage of the 50 cards better with this release, such as using larger tile sizes or packing more tiles, since the cross-link I/O to the card is a performance killer for “chatty” workloads.

Don’t have a 50 card unfortunately, so can’t do empirical studies…

does this by any chance also help 4090 vs prior versions?

dont think so. Before these adaptions, the 5000 cards would perform worse than their 4000 counterparts.

On my 3080 ti, all models are about the same speed except: Nyx, APfast, and Aion. They are slower by a noticeable amount in the benchmark.
Some models look to be a tiny bit faster, but I have not had a chance to try them out on any real videos yet.

The biggest design change from Ada Lovelace to Blackwell is that int32 precision is now as fast as fp32 precision.

And that the GPU is able to handle all in parallel, with AI cores.

B4 Blackwell int precisions where 50% slower.

That means if int precisions where made to perform nice with pre Backwell gpus era (Turing, Ampere & Lovelace) and you change that now the older gpus will be much slower.

2 Likes

Would it be difficult to just keep the old models and offer the new, Blackwell-optimized models along side the existing models?

Such an approach would give Blackwell owners good performance (now) without making everyone wait for the eventually-unified (I hope) models.

2 Likes

Trying this version for fun to see how much faster it is for my 5090. It’s quite a bit faster at the start for a while and then the drops to go back up at the end.

The VFR bug is still there but seems to pop up when there’s decibels in the framerate, I’ve done several 25 fps → 50 fps conversions without a hitch(constant fps reported). 29.97 conversions still report variable fps.

I have been using it for a while now and it works great on my RTX 5090, thank you! The performance is no longer worse than it used to be on the 4090, much better in fact.

1 Like

Only just found out about this Alpha version and was disappointed that rendering was slower on the 5090 than the 4090 on my existing TVAI install.

So, after I installed this Alpha, the one test I’ve just tried seems MUCH better :slight_smile:

720x576 SD RHEA x2 upscale
Movie runtime: 20 mins, 20 secs
4090 render time (STANDARD TVAI VERSION): 24.3fps
5090 render time (STANDARD TVAI VERSION): 22fps
5090 render time (ALPHA 6.1.1.1 TVAI VERSION): (32fps)
No slowdown during render and maxed out GPU to 100%

1 Like

I ran another test this with a 1920x720 standard HD file h264/mp4 but it was slower than the normal TVAI version and even slower than the 4090.

1920x720 HD File - PROTEUS
Movie runtime: 30 mins, 19 secs
4090 render time (STANDARD TVAI VERSION): 16.2fps
5090 render time (STANDARD TVAI VERSION): 15.6fps
5090 render time (ALPHA 6.1.1.1 TVAI VERSION): 15.3fps
GPU fluctuates throughout rendering process

I built a new PC using the same 5090 and I’ve gone from a Intel 13900KS (on the previous above tests) to an AMD Ryzen 9 9950X3D now and re-ran the same tests.

The first 720x576 SD RHEA x2 upscale test showed a 2 fps drop to 30fps.

The second 1920x720 HD File - PROTEUS test showed a decent increase to 23fps.