Video AI 5.4.X - User Benchmarking Results

Thats exactly what I just did (and discovered…) paused the one that was going on (a solo) and the next 2 started since I changed it to 2, and teh performance boost was immediate.

Wild. The settings on MAC are bizarre. Low Power and reduce memory for better performance… although I am running max memory for multiples…

1 Like

Excellent. Glad you got such a good benefit too on your M2 Pro mini. Just posted about this in General so Topaz and others are aware.

Thanks.

Andy

1 Like

Wow, what a build, AMD EPYC + Nvidia L40S, but I expect the result should be way better than regular CPU and GPU, how???

Full HD 1080p (1920x1080)

Topaz Video AI  v5.5.0
System Information
OS: Windows v10.22
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz  15.94 GB
GPU: Radeon RX 580 Series  7.9452 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	02.52 fps 	2X: 	01.75 fps 	4X: 	00.63 fps 	
Iris		1X: 	02.63 fps 	2X: 	01.51 fps 	4X: 	00.52 fps 	
Proteus		1X: 	02.52 fps 	2X: 	01.69 fps 	4X: 	00.65 fps 	
Gaia		1X: 	01.06 fps 	2X: 	00.64 fps 	4X: 	ERR fps 	
Nyx		1X: 	00.80 fps 	2X: 	00.81 fps 	
Nyx Fast		1X: 	02.25 fps 	
Rhea		4X: 	00.08 fps 	
4X Slowmo		Apollo: 	03.03 fps 	APFast: 	10.20 fps 	Chronos: 	01.37 fps 	CHFast: 	02.32 fps 	
16X Slowmo		Aion: 	06.24 fps 	

HD 720p

HD 720p (1280x720)

Topaz Video AI  v5.5.0
System Information
OS: Windows v10.22
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz  15.94 GB
GPU: Radeon RX 580 Series  7.9452 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1280x720
Benchmark Results
Artemis		1X: 	05.68 fps 	2X: 	03.70 fps 	4X: 	01.32 fps 	
Iris		1X: 	05.49 fps 	2X: 	03.31 fps 	4X: 	01.17 fps 	
Proteus		1X: 	05.40 fps 	2X: 	03.80 fps 	4X: 	01.56 fps 	
Gaia		1X: 	02.32 fps 	2X: 	01.42 fps 	4X: 	00.96 fps 	
Nyx		1X: 	01.84 fps 	2X: 	01.67 fps 	
Nyx Fast		1X: 	04.30 fps 	
Rhea		4X: 	00.71 fps 	
4X Slowmo		Apollo: 	05.99 fps 	APFast: 	20.16 fps 	Chronos: 	02.98 fps 	CHFast: 	04.93 fps 	
16X Slowmo		Aion: 	15.17 fps 	

480p

480p (640x480)

Topaz Video AI  v5.5.0
System Information
OS: Windows v10.22
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz  15.94 GB
GPU: Radeon RX 580 Series  7.9452 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 640x480
Benchmark Results
Artemis		1X: 	14.00 fps 	2X: 	08.90 fps 	4X: 	03.04 fps 	
Iris		1X: 	14.70 fps 	2X: 	08.84 fps 	4X: 	02.91 fps 	
Proteus		1X: 	14.44 fps 	2X: 	09.34 fps 	4X: 	04.50 fps 	
Gaia		1X: 	05.93 fps 	2X: 	03.84 fps 	4X: 	02.75 fps 	
Nyx		1X: 	04.98 fps 	2X: 	04.24 fps 	
Nyx Fast		1X: 	10.80 fps 	
Rhea		4X: 	01.84 fps 	
4X Slowmo		Apollo: 	15.91 fps 	APFast: 	43.39 fps 	Chronos: 	07.34 fps 	CHFast: 	12.22 fps 	
16X Slowmo		Aion: 	32.25 fps 	

360p

360p (480x360)

Topaz Video AI  v5.5.0
System Information
OS: Windows v10.22
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz  15.94 GB
GPU: Radeon RX 580 Series  7.9452 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 480x360
Benchmark Results
Artemis		1X: 	24.80 fps 	2X: 	14.96 fps 	4X: 	04.59 fps 	
Iris		1X: 	25.62 fps 	2X: 	16.36 fps 	4X: 	04.92 fps 	
Proteus		1X: 	24.91 fps 	2X: 	17.56 fps 	4X: 	05.67 fps 	
Gaia		1X: 	09.90 fps 	2X: 	06.31 fps 	4X: 	04.03 fps 	
Nyx		1X: 	10.15 fps 	2X: 	08.77 fps 	
Nyx Fast		1X: 	21.26 fps 	
Rhea		4X: 	03.21 fps 	
4X Slowmo		Apollo: 	25.05 fps 	APFast: 	70.42 fps 	Chronos: 	13.26 fps 	CHFast: 	19.77 fps 	
16X Slowmo		Aion: 	48.56 fps 	

This addition keyword could be useful, may also refer to another card that has fairly similar performance, could be a reference to a card of the same generation (Polaris):

Radeon RX 570 Series
Radeon RX 480 Series
Radeon RX 470 Series
Radeon Pro WX 5100 Series
Radeon Pro WX 7100 Series
Radeon Pro WX 7100 (Mobile) Series
Radeon Pro WX 7130 (Mobile) Series
Radeon Pro 575 Series
Radeon Pro 575X Series
Radeon Pro 580 Series
Radeon Pro 580X Series

TVAI doesn’t really use more than 8 cores and EPYC CPU is limited to 3.7Ghz vs sth like 4.5 on a 7950 when only using 8 cores.
Results seem still a bit low since L40S seems to be a beefed up 4090.
Maybe TVAI multi-threading gets messed up by trying to use too many cores and reducing TVAI to just use sth like 8 cores (e.g. with process lasso) might actually lift performance.

Topaz Video AI  v5.5.0
System Information
OS: Windows v11.23
CPU: AMD Ryzen 7 5800X 8-Core Processor               31.921 GB
GPU: NVIDIA GeForce RTX 3070  7.8301 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	14.44 fps 	2X: 	09.10 fps 	4X: 	02.51 fps 	
Iris		1X: 	14.36 fps 	2X: 	08.35 fps 	4X: 	02.64 fps 	
Proteus		1X: 	14.11 fps 	2X: 	10.09 fps 	4X: 	03.40 fps 	
Gaia		1X: 	04.94 fps 	2X: 	03.42 fps 	4X: 	02.28 fps 	
Nyx		1X: 	05.98 fps 	2X: 	04.92 fps 	
Nyx Fast		1X: 	11.70 fps 	
Rhea		4X: 	01.92 fps 	
4X Slowmo		Apollo: 	20.68 fps 	APFast: 	48.00 fps 	Chronos: 	11.14 fps 	CHFast: 	18.09 fps 	
16X Slowmo		Aion: 	22.99 fps 	

Topaz Video AI  v5.5.0
System Information
OS: Windows v11.23
CPU: AMD Ryzen 7 7800X3D 8-Core Processor             31.604 GB
GPU: NVIDIA GeForce RTX 3080  9.8174 GB
Processing Settings
device: -2 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	19.99 fps 	2X: 	13.73 fps 	4X: 	04.72 fps 	
Iris		1X: 	19.05 fps 	2X: 	12.47 fps 	4X: 	03.81 fps 	
Proteus		1X: 	19.48 fps 	2X: 	13.68 fps 	4X: 	05.08 fps 	
Gaia		1X: 	06.57 fps 	2X: 	04.48 fps 	4X: 	03.14 fps 	
Nyx		1X: 	07.86 fps 	2X: 	06.80 fps 	
Nyx Fast		1X: 	16.53 fps 	
Rhea		4X: 	02.72 fps 	
4X Slowmo		Apollo: 	24.26 fps 	APFast: 	62.23 fps 	Chronos: 	14.72 fps 	CHFast: 	24.20 fps 	
16X Slowmo		Aion: 	35.76 fps 	

Quite happy with these results, running an off brand HP RTX 3080. Haven’t had TVAi for more than a week or so and have learned a lot about the why’s and how’s regarding the extremely sensitive voltages and clock speeds compared to synthetic tests(a lot of parts, such as tensor cores and CUDA cores don’t get utilized the same, or barely at all simultaneously, compared to AI related rendering. Before this i ran a fully stable undervolt and overclock at around 938mV@2040MHz with memory at +1200MHz and was barely touching 280W, scoring 10-15% higher than other overclocked 3080s. Stable through several different benchmarks and multiple hours long standard stability tests. But that all changed rendering with TVAi due to the nature of AI rendering. It was far from stable.

Comparing my results with most rtx 4090 in here. For the interpolation models Apollo and Chronos, my 3080 seem to render at same/slightly faster speeds. For the enhancement/upscaling models, 4090’s posts in here seem to average about 33-35fps for art, iris and gaia, which is about 40-45% faster than my 3080. Looking at raw specifications regarding the part of the GPU that is optimized for AI rendering, the 4090 stock should render about 80-88% faster than a stock 3080. I’d most likely fall behind running huge upscaling such as Rhea due to less than half the amount of VRAM.

MSI Afterburner is the app I’ve used. Above benchmarks are with a voltage limit set to 875mV, and max boost clock at 1860Mhz, memory clock at +900MHz and a 98% power limit. And of course, both CPU and memory and optimized as well. CPU oc -27 llc mode 4, Ram 2x16Gb@6000MHz 30cl and tightened timings with a latency of 63ns. Haven’t had to change any settings regarding those two from my previous, as loads on cpu and ram rarely surpass 75% power limit for the cpu and memory bandwidth is barely running at 50% of its top speed.

—DISCLAIMER–
My clock speeds might not be optimal for other 3080 cards, but it could be a decent starting point for anyone with a 3080 looking to min-max it’s performance. Do this at your own risk.

With that said, GPU crashes nowadays aren’t as bad as they used to be, as they put a lot of restraints on the cards as is. You’d really have to try hard to cause actual damage to it, eve, from severe crashes causing a full reboot.

After hours of testing different input formats, model combinations and outputformats with crash count surpassing x10 times, it’s now been running stable without any crash for about 3 days with an avg. 15h of rendering per day.

Finding a voltage that works with a corresponding clock is so much more sensitive than a typical benchmark. Slightly too high voltage and clock speeds drop drastically due power limit throttle. Slightly too high clock or slightly too low voltage limit it crashes.

My current voltage limit puts the gpu at an average 94-97% power, or about 300-310W(rtx 3080 power limit is 320W). Total Gpu load hovers around 98-100% with an average at slightly more than 99%. I have however put a limit on power to 98% as some of the models, eg. Apollo massively spikes in power, both by itself and combined with an enhancement model. These spikes were the biggest offender regarding gpu micro crashes(1-2s full freeze, screen flicker, process gets and error, and the gpu reloads - so no full-on crash that causes a reboot, but a crash none the less forcing the process to restart from the beginning). I’ve mostly been running without any power limit but set it to 98% yesterday when i was closely monitoring its power running Apollo. Apollo massively shifts in power, producing spikes between 250W to 330W. Setting a slight power limit should help mitigate that to a degree, since I’ve been rendering with apollo most of the last 24h and had no crashes. It’s the most prominent model to cause issues, due to its power spikes - at least from what I’ve found.

Due to recent thermal pad change as well as adding a few extra thermal pads on a few small ram modules, and repasting the gpu-chip. Core temp doesn’t pass 80C, hotspot 92C, memory conjunction 88C. Everything air cooled, with a conservative fan curve. Fans won’t run at more than 60%(a limit I’ve set, as i render overnight) if reaching >83C core, so they average 50-55%, each fan with slightly different % which is the result of hours upon hours of testing, so it’s a well optimized fan curve(worked as a hvac fitter years ago, now engineering) keeps those temps with a difference of +/- 1C during a +24h constant full load batch render.

If anyone has any good insights or guides regarding gpu accelerated AI rendering, I’d be more than happy to read up and learn more. Everything I’ve done so far has been trial and error testing as well as asking chatgpt a few specific questions as to why loads are different. Nothing about voltage and clock speeds as that can differ from gpu to gpu, even from the same manufacturer, thus asking chatgpt about that would be pointless.

Let me know what you think about my approach, and if you have any tips!

Topaz Video AI  v5.4.0
System Information
OS: Windows v11.23
CPU: 13th Gen Intel(R) Core(TM) i7-13700K  63.762 GB
GPU: NVIDIA GeForce RTX 4070  11.73 GB
GPU: Intel(R) UHD Graphics 770  0.125 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	19.45 fps 	2X: 	12.98 fps 	4X: 	04.37 fps 	
Iris		1X: 	20.77 fps 	2X: 	12.20 fps 	4X: 	03.49 fps 	
Proteus		1X: 	19.07 fps 	2X: 	13.02 fps 	4X: 	04.77 fps 	
Gaia		1X: 	06.61 fps 	2X: 	04.55 fps 	4X: 	03.13 fps 	
Nyx		1X: 	07.89 fps 	2X: 	06.66 fps 	
Nyx Fast		1X: 	14.64 fps 	
Rhea		4X: 	02.58 fps 	
4X Slowmo		Apollo: 	26.73 fps 	APFast: 	63.07 fps 	Chronos: 	14.07 fps 	CHFast: 	21.95 fps 		

Topaz Video AI  v5.5.0
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 3900X 12-Core Processor              31.946 GB
GPU: NVIDIA GeForce RTX 4070 SUPER  11.718 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	22.08 fps 	2X: 	10.65 fps 	4X: 	02.72 fps 	
Iris		1X: 	23.83 fps 	2X: 	11.59 fps 	4X: 	02.85 fps 	
Proteus		1X: 	21.32 fps 	2X: 	11.69 fps 	4X: 	03.10 fps 	
Gaia		1X: 	07.22 fps 	2X: 	04.96 fps 	4X: 	02.75 fps 	
Nyx		1X: 	08.85 fps 	2X: 	07.73 fps 	
Nyx Fast		1X: 	16.71 fps 	
Rhea		4X: 	02.55 fps 	
4X Slowmo		Apollo: 	26.78 fps 	APFast: 	44.88 fps 	Chronos: 	17.29 fps 	CHFast: 	23.48 fps 	
16X Slowmo		Aion: 	32.68 fps 	

Topaz Video AI  v5.5.0
System Information
OS: Windows v11.24
CPU: AMD Ryzen 9 7950X 16-Core Processor              62.136 GB
GPU: NVIDIA GeForce RTX 3080 Ti  11.803 GB
GPU: AMD Radeon(TM) Graphics  0.47438 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	24.05 fps 	2X: 	17.06 fps 	4X: 	04.64 fps 	
Iris		1X: 	23.39 fps 	2X: 	14.89 fps 	4X: 	04.52 fps 	
Proteus		1X: 	23.26 fps 	2X: 	16.89 fps 	4X: 	05.90 fps 	
Gaia		1X: 	08.36 fps 	2X: 	05.78 fps 	4X: 	03.91 fps 	
Nyx		1X: 	10.29 fps 	2X: 	08.75 fps 	
Nyx Fast		1X: 	19.91 fps 	
Rhea		4X: 	03.48 fps 	
4X Slowmo		Apollo: 	35.81 fps 	APFast: 	94.64 fps 	Chronos: 	18.76 fps 	CHFast: 	30.78 fps 	
16X Slowmo		Aion: 	45.26 fps 	

Topaz Video AI  v5.5.0
System Information
OS: Windows v11.24
CPU: AMD Ryzen 9 7950X 16-Core Processor              62.136 GB
GPU: NVIDIA GeForce RTX 3080 Ti  11.803 GB
GPU: AMD Radeon(TM) Graphics  0.47438 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 3840x2160
Benchmark Results
Artemis		1X: 	05.10 fps 	2X: 	03.68 fps 	4X: 	01.06 fps 	
Iris		1X: 	04.89 fps 	2X: 	03.16 fps 	4X: 	00.95 fps 	
Proteus		1X: 	04.87 fps 	2X: 	03.56 fps 	4X: 	01.26 fps 	
Gaia		1X: 	01.76 fps 	2X: 	01.21 fps 	4X: 	00.82 fps 	
Nyx		1X: 	01.72 fps 	2X: 	02.28 fps 	
Nyx Fast		1X: 	03.25 fps 	
Rhea		4X: 	00.73 fps 	
4X Slowmo		Apollo: 	09.42 fps 	APFast: 	28.85 fps 	Chronos: 	03.99 fps 	CHFast: 	07.84 fps 	
16X Slowmo		Aion: 	10.31 fps 	

Topaz Video AI  v5.5.0
System Information
OS: Windows v10.22
CPU: AMD Ryzen Threadripper 3970X 32-Core Processor   127.87 GB
GPU: NVIDIA GeForce RTX 3090  23.756 GB
Processing Settings
device: -2 vram: 1 instances: 0
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	24.45 fps 	2X: 	12.16 fps 	4X: 	03.48 fps 	
Iris		1X: 	22.29 fps 	2X: 	14.80 fps 	4X: 	03.65 fps 	
Proteus		1X: 	23.63 fps 	2X: 	14.20 fps 	4X: 	03.76 fps 	
Gaia		1X: 	08.04 fps 	2X: 	05.65 fps 	4X: 	03.25 fps 	
Nyx		1X: 	10.20 fps 	2X: 	08.06 fps 	
Nyx Fast		1X: 	19.24 fps 	
Rhea		4X: 	03.04 fps 	
4X Slowmo		Apollo: 	33.20 fps 	APFast: 	60.59 fps 	Chronos: 	18.94 fps 	CHFast: 	26.73 fps 	
16X Slowmo		Aion: 	24.62 fps 	

Topaz Video AI  v5.5.0
System Information
OS: Mac v14.0701
CPU: Intel(R) Xeon(R) W-3245 CPU @ 3.20GHz  128 GB
GPU: AMD Radeon RX 6900 XT  15.984 GB
GPU: AMD Radeon Pro W5700X  15.984 GB
GPU: AMD Radeon Pro W5700X  15.984 GB
Processing Settings
device: 3 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	09.73 fps 	2X: 	04.51 fps 	4X: 	01.37 fps 	
Iris		1X: 	10.41 fps 	2X: 	06.80 fps 	4X: 	01.26 fps 	
Proteus		1X: 	14.61 fps 	2X: 	09.03 fps 	4X: 	01.67 fps 	
Gaia		1X: 	04.53 fps 	2X: 	02.87 fps 	4X: 	02.19 fps 	
Nyx		1X: 	06.52 fps 	2X: 	04.99 fps 	
Nyx Fast		1X: 	16.02 fps 	
Rhea		4X: 	02.17 fps 	
4X Slowmo		Apollo: 	10.90 fps 	APFast: 	31.39 fps 	Chronos: 	01.37 fps 	CHFast: 	03.02 fps 	
16X Slowmo		Aion: 	ERR fps 	

I just upgraded from a i9-10850k + DDR4-2400 CL15 to a Core Ultra 7 265k + DDR5-6800 CL34 keeping a 3080 ti fe and all other components (except mobo of course). The Topaz benchmark results were largely unchanged for input 1080p, some a frame faster some actually slightly slower somehow. However the practical benchmark of actually processing files I am seeing a ~50% improvement, for example with Iris 1080p manual params I was previously getting benchmark of 11 fps but real file processing of ~5fps where as with the upgrade the benchmark went to 12fps but real processing jumped to 8.5fps. I have never understood the point of this useless benchmarking tool that seems to mainly be testing the gpu in isolation and doesn’t remotely represent real file processing the thing I imagine all of us actually care about.

2 Likes

I noticed that the benchmark isn’t accurate in windows, but with Mac the number that it shows in the benchmarks is the real life processing time. So benchmarks results are accurate for Mac, but not for Windows.

Topaz Video AI  v5.5.0
System Information
OS: Windows v11.24
CPU: Intel(R) Xeon(R) w5-2465X  127.25 GB
GPU: NVIDIA GeForce RTX 4070  11.73 GB
GPU: NVIDIA RTX A4000  15.79 GB
Processing Settings
device: 0 vram: 0.9 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis		1X: 	18.64 fps 	2X: 	12.32 fps 	4X: 	03.72 fps 	
Iris		1X: 	19.12 fps 	2X: 	11.65 fps 	4X: 	03.26 fps 	
Proteus		1X: 	17.85 fps 	2X: 	12.27 fps 	4X: 	04.25 fps 	
Gaia		1X: 	06.15 fps 	2X: 	04.25 fps 	4X: 	02.91 fps 	
Nyx		1X: 	07.28 fps 	2X: 	06.22 fps 	
Nyx Fast		1X: 	13.49 fps 	
Rhea		4X: 	00NaN fps 	
4X Slowmo		Apollo: 	24.06 fps 	APFast: 	54.01 fps 	Chronos: 	13.22 fps 	CHFast: 	19.91 fps 	
16X Slowmo		Aion: 	27.33 fps 	

Maybe u can still post your benchmark becaue it would be the first with a 265K CPU I think. It’s also useful to comparewith other benchmarks, even if it doesn’t always indicate real world performance since the benchmarks should be at least skewed in the same way. Thanks!

1 Like

I think Benchmarks Means Nothing.

Real process is the key.
I Made some test with this two computers.
Topaz Benchmark:
Apple m4 pro: 10,21 fps proteus 1080
12900k + 4080 super: 29. fps proteus 1080. (2.8 times faster)

When i process a Real Video 576 to 1080 i get.
Real Process:
Apple m4 pro: 0.44x in ffmpeg
12900k+4080 super: 0.66x in ffmpeg (0.5 times faster)

So the only real Benchmark is to process at least 5 minutes of video with same parameters, i think.

100%. A lot comes the input file and output parameters.

For actual comparison and benchmarking between one machine and another would be using the exact same file and in-app settings, since the different models put different kind of load on gpu, CPU and memory.

To properly test, file’s that would take +1h to render, as well as compare different combinations of models would be ideal.

Example 1:
If the i output file needs to be encoded while being processed, part of the GPUs resources has to go to its video engine, thus lowering resources for other gpu loads.

Example 2:
My benchmark for the interpolation models are faster than most, but I only have a rtx 3080. Which makes me believe memory latency and/or read/write speeds and/or what kind of cpu affects those models a lot more than the gpu does.

Also the in-app benchmark is for a second to a few seconds tops per model. Could run stupid clockspeeds that’d normally cause a crash.