No improvement on speed after upgrading GPU

tommy.li · June 28, 2023, 11:59pm

I upgraded from RTX 2070S to RTX 4090. However, when I run a benchmark (below), Im not getting the spped that I am expected based on comparison to benchmark people posted with similar hardware. Even worse, real life performance is worse. Upscaling 1080p 2x to 4k using proteus (auto)/artemis run at ~6 fps. Even weirder, 1x pass on 1080p using proteus also runs at around 6-7 fps, no where near my benchmark result. I tried everything I can think of. Reinstall Topaz, clean install of latest game ready and studio driver. Moving source file, temp directory, and output file to an m.2 SSD (Samsung 950 PRO), playing with max memory usage setting in topaz preference. When running (ProRes LT output), my CPU usage is close to 90% while GPU usage is only around 20%. Even on mechanical harddrive, disk usage is only 10% at most (writing at around 16 MB/s during active period). Switch to h265 (nvidia) encoder increased my GPU usage to 60-70% (CPU usage lower at around 60-80%) but I’m not getting any more fps. Previously with my RTX 2070S, I was getting 6-8 fps using artemis model doing 2x 1080p upscale. Changing to a 4090 did not result in any gain real world performance. The only thing I cannot test is the CPU. Is my 5900 really bottlenecking 4090 that much? Even if its bottleneck 4090, I would at least expect some gain in performance switching from a 2070S.

Anyone got similar hardwares or have encountered similar issue.

Topaz Video AI  v3.3.2
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 5900X 12-Core Processor              31.914 GB
GPU: NVIDIA GeForce RTX 4090  23.59 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	32.19 fps 	2X: 	09.58 fps 	4X: 	02.35 fps 	
Proteus		1X: 	21.04 fps 	2X: 	08.18 fps 	4X: 	02.26 fps 	
Gaia		1X: 	15.64 fps 	2X: 	08.80 fps 	4X: 	02.46 fps 	
4X Slowmo		Apollo: 	31.48 fps 	APFast: 	40.69 fps 	Chronos: 	31.18 fps 	CHFast: 	27.39 fps

Imo · June 29, 2023, 12:35am

These are the resultes with my RTX 2070 Super:

Topaz Video AI Beta  v3.3.3.0.b
System Information
OS: Windows v11.22
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz  31.916 GB
GPU: NVIDIA GeForce RTX 2070 SUPER  7.8154 GB
Processing Settings
device: 0 vram: 0.5 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	09.15 fps 	2X: 	06.45 fps 	4X: 	01.97 fps 	
Proteus		1X: 	08.97 fps 	2X: 	06.00 fps 	4X: 	02.03 fps 	
Gaia		1X: 	03.12 fps 	2X: 	02.09 fps 	4X: 	01.44 fps 	
4X Slowmo		Apollo: 	04.02 fps 	APFast: 	32.23 fps 	Chronos: 	07.05 fps 	CHFast: 	09.62 fps

reveilebnu · June 29, 2023, 12:36am

try this
AI processing Nvidia RTX 4090 + output/codec H.264/265 Nvidia

SD < HD = H.264 Nvidia

and

HD < UHD = (nvidia)H.264/265 +/- 10bit

tommy.li · June 29, 2023, 12:40am

I have tried output to ProRes vs H265 nvidia. I’m not seeing any speed up, beside just more GPU usage.

For my workflow, I prefer output to ProRes then reencode to h265.

tommy.li · June 29, 2023, 12:42am

Is the recent verison of Topaz this CPU bound that going from 2070S to 4090 only leads to 2-3 fps gain doing 2x upscale of 1080p source?

Imo · June 29, 2023, 12:46am

According to a users benchmark with a real fast CPU and the RTX 4090 the numbers should look like this:

reveilebnu · June 29, 2023, 12:49am

for me

i7-12700 with disable E-4 cores
RTX 3060-12

Topaz Video v.332 = AI processing Nvidia + H.264 Nvidia (don’t like ProRes)

tommy.li · June 29, 2023, 12:58am

Yea that CPU is certainly faster than 5900X. My issues is that I’m not getting anywhere close to my benchmark of 21 fps for 1x Proteus (my real world speed is 6 fps).

I always thought Topaz is GPU demanding

Imo · June 29, 2023, 1:54am

I think it depends a bit on the mode you choose and the upscale. Artemis seems to be more CPU demanding than Iris.

tommy.li · June 29, 2023, 1:54am

Iris is even slower for me, 3-4 fps.

lhkjacky · June 29, 2023, 10:53am

Are you using Proteus “Auto” or “Relative to Auto” ?
If you are using “Auto” or “Relative to Auto”, the program has to analysis every frame and adjust the parameters automatically. Just imagine you have to click the “Estimate” button on every frame and it seem that CPU take a important role in Estimating parameters.

If you want faster processing time, you should use Proteus “Manual”.
Also developer mentioned that benchmark does not include the I/O read / write, so real life result should be slightly slower than the benchmark.

For example, my 1920x1080 bechmark score for Proteus 1X,
"V3.3.0: 17.86 fps" and "V3.3.1: 18.12 fps"
And my real life result for Proteus (manual) is 17.5 fps.

If you don’t see any improvement after upgrading your GPU, which mean that others components in your computer may be the bottleneck for you system.

tommy.li · June 29, 2023, 1:09pm

Yes if I use manual, my speed doubles to about 12-14 fps doing 1x pass using Proteus. However, 14 fps is still quite a bit slower than my benchmark result (21 fps). Also my real life 2x Artemis speed is slower (6 vs 9 fps) than benchmark. Does benchmark take into account your CPU as well. I’m trying figure what component is causing the slow down. I tried ssd vs mechanical drive (described in my initial post). My ssd is plugged into the m.2 slot on my motherboard. While its not the newest and fastest SSD, i don’t see Topaz saturating the write on the ssd. I have not tested crazy fast ssd setup like 2 or more ssd in raid 0 plugged into one of the pcie slot.
I also tried setting Topaz (ffmpeg in this case) affinity to half or less of my CPU cores. I found setting affinity to half of the cores, CPU usage goes down, GPU usage stays the same, my CPU boost to higher clock, and fps stays the same or under some scenario, increases slightly (by 0.1-0.3 fps). Setting affinity to smaller and smaller number of cores will eventually slow down Topaz. So seems like Topaz (ffmpeg) benefits more from faster cores than core counts to a certain degree. So not sure if I used 5950X instead of 5900X with 4 more cores (8 more threads) will give me any improvement.

Just trying to figure out what component is limiting speed and not spend money upgrading the “wrong” component.

TPX · June 29, 2023, 1:21pm

Do a full uninstall of TVAI and clean up reg too.

Get Display driver uninstaller and uninstall the nvidia gpu driver, install the actual studio driver.

tommy.li · June 29, 2023, 11:02pm

Thank you for the tip. Tried your suggestion. Uninstall, clean reg, uninstall driver, reinstall studio driver, reinstall TVAI. Still same speed.

sergiohzph · June 29, 2023, 11:31pm

It’s normal, it’s the programme’s fault, I have a 4090 and it’s also very slow for me.

My benchmark results in the last version at 1080p:

Topaz Video AI  v3.3.2
System Information
OS: Windows v11.22
CPU: 13th Gen Intel(R) Core(TM) i9-13900K  31.685 GB
GPU: NVIDIA GeForce RTX 4090  22.096 GB
GPU: Intel(R) UHD Graphics 770  0.125 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	36.89 fps 	2X: 	15.87 fps 	4X: 	04.18 fps 	
Proteus		1X: 	31.92 fps 	2X: 	14.43 fps 	4X: 	04.02 fps 	
Gaia		1X: 	14.09 fps 	2X: 	09.75 fps 	4X: 	04.46 fps 	
4X Slowmo		Apollo: 	37.40 fps 	APFast: 	69.57 fps 	Chronos: 	31.59 fps 	CHFast: 	30.93 fps

At 2160p:

Topaz Video AI  v3.3.2
System Information
OS: Windows v11.22
CPU: 13th Gen Intel(R) Core(TM) i9-13900K  31.685 GB
GPU: NVIDIA GeForce RTX 4090  22.096 GB
GPU: Intel(R) UHD Graphics 770  0.125 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 3840x2160
Benchmark Results
Artemis		1X: 	07.65 fps 	2X: 	02.95 fps 	4X: 	00.55 fps 	
Proteus		1X: 	06.20 fps 	2X: 	02.86 fps 	4X: 	00.66 fps 	
Gaia		1X: 	03.00 fps 	2X: 	02.04 fps 	4X: 	00.94 fps 	
4X Slowmo		Apollo: 	15.03 fps 	APFast: 	21.23 fps 	Chronos: 	06.39 fps 	CHFast: 	10.81 fps

I will not tire of saying it, this programme is not properly optimised. And it does not exploit the performance of the RTX 4090 at all.
I wish developers would finally focus on making performance normal for high-end graphics cards.

It can’t be the fault of anything external to the programme as other AI programmes I use run like the wind.

tommy.li · June 29, 2023, 11:32pm

So you are not getting the speed you see in your benchmark in real life workflow?

sergiohzph · June 29, 2023, 11:45pm

It should definitely be much faster. I’ll make a comparison right now with two screenshots using two different programs.

Interpolating with Apollo, 1080p source:

Interpolating with CAIN, (similar to Apollo):

This program (called enhancr in case you’re interested) contains TRT (TensorRT, an NVIDIA implementation that shamelessly speeds up processing) models, which as you can see, a music video takes about 1 minute to process. Whereas TVAI with Apollo or any other model, it takes about 15 minutes or 30-45mins if combined with 4k upscaling at the same time. Even in the latest update the developer implemented DirectML versions which is similar to TensorRT but is developed by Microsoft, it also speeds up performance drastically, much more than TVAI.

As I understand it, and as I read some time ago. TVAI has TRT models, which I don’t see the logic in them being so exaggeratedly slow. So they are simply poorly optimised and perhaps by increasing their VRAM usage they can be accelerated. Are the models FP16 architecture? That would also drastically increase performance on graphics cards with Tensor cores, if the models are FP32 then no wonder they are so slow.

lhkjacky · June 30, 2023, 7:56am

Your forum profile has been set to hidden , but I’m guessing you must be a new user of VEAI / VAI.
Otherwise you should remember how slow VEAI was in the old time and should be grateful for how “fast” VAI is at the moment.

Since V1.7.0 VEAI add support of Tensor core and they started changing model from fp32 to fp16 for supported GPU, many users has got speed improve 2 times faster.

In V1.2.3, RTX2080Ti 720p 2x upscale at 1.67fps

In V1.8.0, RTX3090 720p 2x upscale with Artemis at around 5fps.

In V2.2.0, RTX3090 1080p 2x upscale with Artemis is around 3fps.

And now, V3.3.0, RTX3090 you get 50fps for 720p 2x , 12fps for 1080p 2x Artemis.

sergiohzph · June 30, 2023, 8:41am

My point is that performance on a 4090 has always been woeful. It’s better suited for other graphics cards, that’s something I’ve always mentioned. Which makes no sense, the 4090 is the most powerful graphics card currently for AI inference and should have no reason to have lousy performance on TVAI.

lhkjacky · June 30, 2023, 8:49am

I understand the frustration of having the most powerful GPU but not being able to use it to its full potential.

However, this happened before when the RTX 3090 was released; many users bought it for VEAI and just found out that it had little improvement over their last card.