Underperformance (benchmark vs. in practice rendering)

tom-5324 · September 20, 2023, 11:34am

Hi everyone,

I’m working on a Intel 12700K, 128 MB RAM, 4070 TI, Windows 10 system.
I’ve got a feeling that with some settings the system is extremely underperforming. Benchmarks show Proteus 1x 1080p: 15,5 fps. But as soon as I render something 1080p, Proteus auto settings (high quality source material), no scaling, the system renders with max 5 fps.

How do your benchmarks compare to real results?

I’ve got a hypothesis that maybe the Intel CPU’s efficiency cores are also being used in the process and Topaz has to wait on them. No idea if this is plausible, but I will test it soon and compare all cores to only performance cores enabled in bios.

In the meantime the benchmark vs. in practice results may be an interesting discussion.

lhkjacky · September 20, 2023, 3:04pm

You should choose Proteus Manual if you want to get similar result as Benchmark.

Proteus Auto means that TVAI have to analyse every frame and keep alternating the parameter. The estimate process increase the processing time significantly. That is why it is slower than benchmark.

Also Benchmark only measure the AI model processing speed. It does not include encode/decode of the video and I/O read & I/O write to harddisk.

Maximus · September 26, 2023, 7:23pm

I noticed similar behaviour as well.

Once you start upscaling 1080 or 2160 footage the benchmark and real world perfromance differ quite a lot, even on 13000K with a 4090.
Done some extensive testing including the use of hwaccel H265 on input and nvenc on output to offload the CPU but that doesn’t help anything. Offloading decoding and encoding help the CPU load but doesn’t change the overall performance on upscaling high res inputs. Now the interesting part is that even if you use Prores software encoding and disable the efficiency cores, the overall performance doesn’t drop.

Also reducing the Power limit of my 4090 to 30% and underclocking the GPU doesn’t have any real world performance drops whatsoever.

I validated the IO as well but everything is running in NVME at 1/100 of it’s throughput capabilities. The PCIe4 16x Front Side Bus utilization is running at 30% load, so even there is some headroom left.

What I did notice is the impact of DDR5 throughput, upgrading from DDR5-5600 to DDR5-6800 had roughly a 20% performance boost. I am left with only 1 conclusion. Real world high res input upscaling with the Topaz AI software is limited by memory throughput. To fully utilize the hardware capabilities this is first bottleneck Topaz needs to address for high res upscaling.

Maybe they can shed some light on this bottleneck.

Cheers