I upgraded from RTX 2070S to RTX 4090. However, when I run a benchmark (below), Im not getting the spped that I am expected based on comparison to benchmark people posted with similar hardware. Even worse, real life performance is worse. Upscaling 1080p 2x to 4k using proteus (auto)/artemis run at ~6 fps. Even weirder, 1x pass on 1080p using proteus also runs at around 6-7 fps, no where near my benchmark result. I tried everything I can think of. Reinstall Topaz, clean install of latest game ready and studio driver. Moving source file, temp directory, and output file to an m.2 SSD (Samsung 950 PRO), playing with max memory usage setting in topaz preference. When running (ProRes LT output), my CPU usage is close to 90% while GPU usage is only around 20%. Even on mechanical harddrive, disk usage is only 10% at most (writing at around 16 MB/s during active period). Switch to h265 (nvidia) encoder increased my GPU usage to 60-70% (CPU usage lower at around 60-80%) but I’m not getting any more fps. Previously with my RTX 2070S, I was getting 6-8 fps using artemis model doing 2x 1080p upscale. Changing to a 4090 did not result in any gain real world performance. The only thing I cannot test is the CPU. Is my 5900 really bottlenecking 4090 that much? Even if its bottleneck 4090, I would at least expect some gain in performance switching from a 2070S.
Anyone got similar hardwares or have encountered similar issue.
Yea that CPU is certainly faster than 5900X. My issues is that I’m not getting anywhere close to my benchmark of 21 fps for 1x Proteus (my real world speed is 6 fps).
Are you using Proteus “Auto” or “Relative to Auto” ?
If you are using “Auto” or “Relative to Auto”, the program has to analysis every frame and adjust the parameters automatically. Just imagine you have to click the “Estimate” button on every frame and it seem that CPU take a important role in Estimating parameters.
If you want faster processing time, you should use Proteus “Manual”.
Also developer mentioned that benchmark does not include the I/O read / write, so real life result should be slightly slower than the benchmark.
For example, my 1920x1080 bechmark score for Proteus 1X, "V3.3.0: 17.86 fps" and "V3.3.1: 18.12 fps"
And my real life result for Proteus (manual) is 17.5 fps.
If you don’t see any improvement after upgrading your GPU, which mean that others components in your computer may be the bottleneck for you system.
Yes if I use manual, my speed doubles to about 12-14 fps doing 1x pass using Proteus. However, 14 fps is still quite a bit slower than my benchmark result (21 fps). Also my real life 2x Artemis speed is slower (6 vs 9 fps) than benchmark. Does benchmark take into account your CPU as well. I’m trying figure what component is causing the slow down. I tried ssd vs mechanical drive (described in my initial post). My ssd is plugged into the m.2 slot on my motherboard. While its not the newest and fastest SSD, i don’t see Topaz saturating the write on the ssd. I have not tested crazy fast ssd setup like 2 or more ssd in raid 0 plugged into one of the pcie slot.
I also tried setting Topaz (ffmpeg in this case) affinity to half or less of my CPU cores. I found setting affinity to half of the cores, CPU usage goes down, GPU usage stays the same, my CPU boost to higher clock, and fps stays the same or under some scenario, increases slightly (by 0.1-0.3 fps). Setting affinity to smaller and smaller number of cores will eventually slow down Topaz. So seems like Topaz (ffmpeg) benefits more from faster cores than core counts to a certain degree. So not sure if I used 5950X instead of 5900X with 4 more cores (8 more threads) will give me any improvement.
Just trying to figure out what component is limiting speed and not spend money upgrading the “wrong” component.
I will not tire of saying it, this programme is not properly optimised. And it does not exploit the performance of the RTX 4090 at all.
I wish developers would finally focus on making performance normal for high-end graphics cards.
It can’t be the fault of anything external to the programme as other AI programmes I use run like the wind.
This program (called enhancr in case you’re interested) contains TRT (TensorRT, an NVIDIA implementation that shamelessly speeds up processing) models, which as you can see, a music video takes about 1 minute to process. Whereas TVAI with Apollo or any other model, it takes about 15 minutes or 30-45mins if combined with 4k upscaling at the same time. Even in the latest update the developer implemented DirectML versions which is similar to TensorRT but is developed by Microsoft, it also speeds up performance drastically, much more than TVAI.
As I understand it, and as I read some time ago. TVAI has TRT models, which I don’t see the logic in them being so exaggeratedly slow. So they are simply poorly optimised and perhaps by increasing their VRAM usage they can be accelerated. Are the models FP16 architecture? That would also drastically increase performance on graphics cards with Tensor cores, if the models are FP32 then no wonder they are so slow.
Your forum profile has been set to hidden , but I’m guessing you must be a new user of VEAI / VAI.
Otherwise you should remember how slow VEAI was in the old time and should be grateful for how “fast” VAI is at the moment.
Since V1.7.0 VEAI add support of Tensor core and they started changing model from fp32 to fp16 for supported GPU, many users has got speed improve 2 times faster.
My point is that performance on a 4090 has always been woeful. It’s better suited for other graphics cards, that’s something I’ve always mentioned. Which makes no sense, the 4090 is the most powerful graphics card currently for AI inference and should have no reason to have lousy performance on TVAI.
I understand the frustration of having the most powerful GPU but not being able to use it to its full potential.
However, this happened before when the RTX 3090 was released; many users bought it for VEAI and just found out that it had little improvement over their last card.