No improvement on speed after upgrading GPU

sergiohzph · June 29, 2023, 11:31pm

It’s normal, it’s the programme’s fault, I have a 4090 and it’s also very slow for me.

My benchmark results in the last version at 1080p:

Topaz Video AI  v3.3.2
System Information
OS: Windows v11.22
CPU: 13th Gen Intel(R) Core(TM) i9-13900K  31.685 GB
GPU: NVIDIA GeForce RTX 4090  22.096 GB
GPU: Intel(R) UHD Graphics 770  0.125 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	36.89 fps 	2X: 	15.87 fps 	4X: 	04.18 fps 	
Proteus		1X: 	31.92 fps 	2X: 	14.43 fps 	4X: 	04.02 fps 	
Gaia		1X: 	14.09 fps 	2X: 	09.75 fps 	4X: 	04.46 fps 	
4X Slowmo		Apollo: 	37.40 fps 	APFast: 	69.57 fps 	Chronos: 	31.59 fps 	CHFast: 	30.93 fps

At 2160p:

Topaz Video AI  v3.3.2
System Information
OS: Windows v11.22
CPU: 13th Gen Intel(R) Core(TM) i9-13900K  31.685 GB
GPU: NVIDIA GeForce RTX 4090  22.096 GB
GPU: Intel(R) UHD Graphics 770  0.125 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 3840x2160
Benchmark Results
Artemis		1X: 	07.65 fps 	2X: 	02.95 fps 	4X: 	00.55 fps 	
Proteus		1X: 	06.20 fps 	2X: 	02.86 fps 	4X: 	00.66 fps 	
Gaia		1X: 	03.00 fps 	2X: 	02.04 fps 	4X: 	00.94 fps 	
4X Slowmo		Apollo: 	15.03 fps 	APFast: 	21.23 fps 	Chronos: 	06.39 fps 	CHFast: 	10.81 fps

I will not tire of saying it, this programme is not properly optimised. And it does not exploit the performance of the RTX 4090 at all.
I wish developers would finally focus on making performance normal for high-end graphics cards.

It can’t be the fault of anything external to the programme as other AI programmes I use run like the wind.

tommy.li · June 29, 2023, 11:32pm

So you are not getting the speed you see in your benchmark in real life workflow?

sergiohzph · June 29, 2023, 11:45pm

It should definitely be much faster. I’ll make a comparison right now with two screenshots using two different programs.

Interpolating with Apollo, 1080p source:

Interpolating with CAIN, (similar to Apollo):

This program (called enhancr in case you’re interested) contains TRT (TensorRT, an NVIDIA implementation that shamelessly speeds up processing) models, which as you can see, a music video takes about 1 minute to process. Whereas TVAI with Apollo or any other model, it takes about 15 minutes or 30-45mins if combined with 4k upscaling at the same time. Even in the latest update the developer implemented DirectML versions which is similar to TensorRT but is developed by Microsoft, it also speeds up performance drastically, much more than TVAI.

As I understand it, and as I read some time ago. TVAI has TRT models, which I don’t see the logic in them being so exaggeratedly slow. So they are simply poorly optimised and perhaps by increasing their VRAM usage they can be accelerated. Are the models FP16 architecture? That would also drastically increase performance on graphics cards with Tensor cores, if the models are FP32 then no wonder they are so slow.

lhkjacky · June 30, 2023, 7:56am

Your forum profile has been set to hidden , but I’m guessing you must be a new user of VEAI / VAI.
Otherwise you should remember how slow VEAI was in the old time and should be grateful for how “fast” VAI is at the moment.

Since V1.7.0 VEAI add support of Tensor core and they started changing model from fp32 to fp16 for supported GPU, many users has got speed improve 2 times faster.

In V1.2.3, RTX2080Ti 720p 2x upscale at 1.67fps

In V1.8.0, RTX3090 720p 2x upscale with Artemis at around 5fps.

In V2.2.0, RTX3090 1080p 2x upscale with Artemis is around 3fps.

And now, V3.3.0, RTX3090 you get 50fps for 720p 2x , 12fps for 1080p 2x Artemis.

sergiohzph · June 30, 2023, 8:41am

My point is that performance on a 4090 has always been woeful. It’s better suited for other graphics cards, that’s something I’ve always mentioned. Which makes no sense, the 4090 is the most powerful graphics card currently for AI inference and should have no reason to have lousy performance on TVAI.

lhkjacky · June 30, 2023, 8:49am

I understand the frustration of having the most powerful GPU but not being able to use it to its full potential.

However, this happened before when the RTX 3090 was released; many users bought it for VEAI and just found out that it had little improvement over their last card.

Imo · June 30, 2023, 10:57am

When I begun with Topaz Video Enhance AI we only Had Gaia HQ and Gaia CG. Processing a full movie took around 18 hours. Now with the latest version I can do the same work in between 4 and 8 hours depending on the mode I choose.

lhkjacky · June 30, 2023, 11:14am

Yes, at that time, we are talking about how many seconds we need to process each frame (sec/frame) not fps.

And thinking about how days we need to process a single video.

tommy.li · June 30, 2023, 1:45pm

My 1080p 2x Artemis speed is 6 fps. How did you manage to get 24 fps using 3090?

reveilebnu · June 30, 2023, 2:01pm

This topic starter uses the ProRes codec and the Artemis filter

double load on the processor - and he did not change the processor

Imo · June 30, 2023, 2:01pm

I had a look into all reported benchmarks of the latest versions an none of them reached more than close to 16 fps in 2 x upscaling!

TPX · June 30, 2023, 2:24pm

Yea, speed, but what about the quality?

Is it better than TVAI in quality?

lhkjacky · June 30, 2023, 2:44pm

Here is the benchmark from 3090:
https://community.topazlabs.com/t/video-ai-v3-3-x-user-benchmarking-results/46760/22
Artemis: 1X: 23.59 fps 2X: 12.31 fps

Thank you for correction, I have a typo in my last post, 24fps is for 1x Artemis, 12fps for 2x Artemis.
The real life result might be slower than the benchmark but it is big improvement compare to older version VEAI already.

tommy.li · June 30, 2023, 2:49pm

Ok makes sense, thank you.

I guess CPU speed contribute to benchmark calculated fps. Since your newer CPU + 3000 GPU has better benchmark score than my older CPU + 4000 GPU. Either that or Topaz cannot make use of the full capability of 4090.

I’m trying to figure out how significant is the CPU bottleneck or the slow fps I have is due to poor optimization for 4090. I can’t just drop 100s of dollars to replace my CPU, ram, motherboard to test. Aside from spending the money to get a new CPU (which may or may not give me meaningful fps gain), I have tried variety of ways to off load work from CPU. I have tried using GPU encoder, which reduced CPU used to roughly 50-60%, yet no noticeable improvement in speed (maybe by 0.1 or 0.2 of fps). I have tried setting affinity of ffmepg to smaller number of cores. Again, no noticeable reduction in fps until I limit ffmepg to very small number of cores (1-4 cores). These testing suggest to me that even if I switch to the newest and top of the line CPU, the gain may not be big (like going from 6 fps to 15 fps for 1080p x2, like the benchmarks are suggesting).

sergiohzph · June 30, 2023, 4:09pm

The difference in quality is barely noticeable in CGI/IRL content, sometimes CAIN produces artifacts in certain scenes while Apollo does not and vice versa, but both types of artifacts are exactly the same type, distorted and blurred movements, especially in scene changes.

With the Chronos model I’m quite satisfied, with a 1080 source I reach 50fps, not bad at all. Comparing with RIFE (as it is similar to Chronos as it is optical flow) the speed I get in the last 4.6 model in its TRT version and TTA/Ensemble enabled (this increases the quality but at the cost of speed) is 160fps, and going up. And the quality is exactly the same as Chronos in its latest version, it produces exactly the same small artefacts in the same places as RIFE 4.6 Ensemble, so I guess the devs implemented this technology to TVAI.

Make the comparisons yourself, I made them using: https://comparevid.com/

These are the videos:

pexels-lukas-rodriguez-17380073 (1080p)_CAIN-2x.zip (36.1 MB)
Compare it with:
pexels-lukas-rodriguez-17380073 (1080p)_apo8.zip (59.3 MB)
pexels-lukas-rodriguez-17380073 (1080p)_RIFE-2x.zip (30.7 MB)
Compare it with:
pexels-lukas-rodriguez-17380073 (1080p)_chr2.zip (59.0 MB)

Do the comparison yourself and draw your own conclusions. My conclusions are that both models are similar, between them (talking about CAIN and Apollo), because they produce the same types of artefacts, although Apollo gives an extra smoothness that CAIN does not have, maybe because I have not configured the settings correctly in enhancr and I have not enabled the de-duplication of frames as I did with TVAI as it was enabled by default. I forgot to mention that both CAIN and Apollo support static elements such as film credits without any problem, it doesn’t distort them or make the letters shake. And well, RIFE and Chronos are exactly the same, at least on how the results look.

If possible you could do both tests yourself and see how CAIN and RIFE make much better use of the GPU than the TVAI models. Although in order to use CAIN and RIFE in their TRT versions you would have to pay at least 7€ to get the paid version of the program.

Imo · July 1, 2023, 3:37am

the 4090 has a tiny hardware switch where you can switch between “gaming” power and “silent” use. please check if you have it on “gaming” for the best performance.

tommy.li · July 1, 2023, 3:46am

That would depend on the model. Mine does not have that switch. I can control the mode in software. Gaming/silent mode does not make a difference in term of speed.

Imo · July 1, 2023, 12:57pm

Okay, please check your power saving options in Windows next, especially the advanced settings. If the maximum cpu usage is set below 99 % you lose half of the fps!

tommy.li · July 1, 2023, 3:49pm

Checked my power setting. My cpu maximum processor state is set to 100%.

TPX · July 1, 2023, 7:15pm

i see a lot of compression, so much that one can’t really compare beside the different compression pattern.

I did it with Resolve, here i can just turn videos of and on as with photoshop and layers.

The thing is i don’t have the original file.

But i will take a look at enhancr.