What does VEAI performance look like between AMD/NVIDIA?

OK, the weakest link is actually the PCI-E.

Two RTX 5000 are a little bit faster.

Here we have 96 SMs (Cores) (2x RTX 5000) vs 82 SMs (1x RTX 3090)

Both RTX 5000 have PCI-E 3x16, 3090 has PCI-E 4x16 = 32 GB/s.

As Nvidia presented yesterday, the transfer rate between CPU and GPU is a problem, which is why they designed GRACE (600GB/s CPU to GPU, but we will never get hands on this).

We can hardly influence the transfer rate between CPU and GPU.

If we now increase the performance of the other components, it does not bring as much as if we would increase the speed of the PCI-E.

The other possibility would be to increase the tile size, which again needs a lot of Vram (something you can see in raytracing) and a training of TL on the tile size which you then execute.

A Larger tile size shifts the performance back towards the GPU.

For example, a 2048px tile requires between 10 and 14 GB of Vram.

I think at the moment we can only wait and see.

__
__
And my theory for the tiles size is.

GPU cores count (not SMs) divided by two, then you have the resolution and each core takes care of one pixel.

3090 = 10496 Core / 2 = 5248px x 5248px
RTX 5000 = 3072 Cores / 2 = 1513px x 1513px

As i said its a theory of mine.

2 Likes

After building a new PC, I could test some NVIDIA GPU models under the exact same circumstances, all within one weekend.

Base System i7-10700 CPU (@ 65W), 32GB RAM @ 3000MHz, Samsung m.2 SSD, no overclocking, everything @ factory default settings. ā€œBlack Desert Online-Wheelā€ clip upscaled to 1440p (200%), no power or memory limitations.

Edit:

I have posted a revised edition of my test results a little further down the thread.

2 Likes

Quick update on this thread, the current version is v 2.3.0. We are continually adding further optimizations so if youā€™re on an older version you may see a speed improvement just by updating to the latest version.
You can update in app or directly through your account at Topaz Labs Account - Topaz Labs

i7 10700
32 GB DDR4 3,000 MHz
m.2 SSD
NVIDIA Studio Driver 462.31
no O.C. tweaks whatsoever, just out-of-the-box performance
with basic Installer 288x288 models:

Gigabyte GTX 1050-Ti (4 GB GDDR5)
VEAI v1.6.1 alq-1.0.1 (3.25 s/frame)
VEAI v1.6.1 gcg-1.0.1 (4.97 s/frame)

VEAI v2.1.1 ALQ v12 (1.74 s/frame)
VEAI v2.1.1 GCG v5 (6.19 s/frame)

ASUS RTX 2060 DUAL OC (6 GB)
VEAI v1.6.1 alq-1.0.1 (0.92 s/frame)
VEAI v1.6.1 gcg-1.0.1 (1.46 s/frame)

VEAI v2.1.1 ALQ v12 (0.26 s/frame)
VEAI v2.1.1 GCG v5 (0.68 s/frame)

KFA2 RTX 2070-Super 1-Click OC
VEAI v1.6.1 alq-1.0.1 (0.71 s/frame)
VEAI v1.6.1 gcg-1.0.1 (0.98 s/frame)

VEAI v2.1.1 ALQ v12 (0.24 s/frame)
VEAI v2.1.1 GCG v5 (0.56 s/frame)

ASUS RTX 2080-Ti ROG STRIX
VEAI v1.6.1 alq-1.0.1 (0.45 s/frame)
VEAI v1.6.1 gcg-1.0.1 (0.62 s/frame)

VEAI v2.1.1 ALQ v12 (0.21 s/frame)
VEAI v2.1.1 GCG v5 (0.38 s/frame)

Gigabyte RTX 3070 Gaming OC (270W)
VEAI v1.6.1 doesnā€™t work with ā€˜Ampereā€™ GPUā€™s,
unfortunatelyā€¦

VEAI v2.1.1 ALQ v12 (0.21 s/frame)
VEAI v2.1.1 GCG v5 (0.41 s/frame)

ā€¦and now with Online Model Download enabled:

Gigabyte GTX 1050-Ti (4 GB GDDR5)
VEAI v1.6.1 ---------------------(same results)

VEAI v2.1.1 ALQ v12 (1.10 s/frame, down from 1.74)
VEAI v2.1.1 GCG v5 (4.38 s/frame, down from 6.19)

ASUS RTX 2060 DUAL OC (6 GB)
VEAI v1.6.1 ----------------

VEAI v2.1.1 ALQ v12 (0.22 s/frame, down from 0.26)
VEAI v2.1.1 GCG v5 (0.47 s/frame, down from 0.68)

KFA2 RTX 2070-Super 1-Click OC
VEAI v1.6.1 ------------------

VEAI v2.1.1 ALQ v12 (0.21 s/frame, down from 0.24)
VEAI v2.1.1 GCG v5 (0.39 s/frame, down from 0.56)

ASUS RTX 2080-Ti ROG STRIX
VEAI v1.6.1 --------------

VEAI v2.1.1 ALQ v12 (0.18 s/frame, down from 0.21)
VEAI v2.1.1 GCG v5 (0.28 s/frame, down from 0.38)

Gigabyte RTX 3070 Gaming OC (270W)
VEAI v1.6.1 doesnā€™t work with ā€˜Ampereā€™ GPUā€™s

VEAI v2.1.1 ALQ v12 (0.19 s/frame, down from 0.21)
VEAI v2.1.1 GCG v5 (0.31 s/frame, down from 0.41)

1 Like

Is anyone using a RX6700XT and if so could you gives some sec./frame numbers for either 720p or 1080p input at 200% using Proteus. That will give me a comparison for my GTX 1660 super.

Thanks

I would also be interested in Performance Numbers for a 6800/XT because of the Radeon Pro W6800 it would be funny if it came out at the Performance of a A5000/6000 but i think the Tensor Cores play up here well.

2x RTX 5000, TR 3000, 24 Core, 128 GB ECC. Driver: 462.59

Win 10

All standard settings.

output Tiff 8 bit.

2.3.0

Gaia CG V5. - 0.23 sec. image. - GPU Load 30% - 78% spikes. (Reduce Load Active: 0.22 sec. - 0.23 sec.)


Gaia HQ V5. - 0.24 sec image. - GPU Load 30% - 78% spikes. (Reduce Load Active: 0.22 sec. - 0.23 sec.)


Artemis HQ V11- 0.16 sec. image - GPU Load 20% - 40% spikes. (Reduce Load Active: equal)


Artemis MQ V12 - 0.16 sec. image - GPU Load 20% - 40% spikes. (Reduce Load Active: equal)


Artemis LQ V12 - 0.16 sec. image - GPU Load 20% - 40% spikes. (Reduce Load Active: equal)


Artemis AA V9. - 0.16 sec. image - GPU Load 20% - 40% spikes. (Reduce Load Active: equal)


Theia Detail V3. - 0.23 - 0.25 sec. image - GPU Load 20% - 50% spikes. (Reduce Load Active: 0.22 - 0.24 sec.)


Theia Fidelity V4. - 0.24 - 25 sec. image - GPU Load 20% - 50% spikes. (Reduce Load Active: 0.22 - 0.24 sec.)


Proteus 6 - Parameter V1 - 0.17 sec. image - GPU Load 20% - 40% spikes. (Reduce Load Active: 0.18 - 0.19 sec.)


1080p ->> 2160p

Artemis Models - 0.35 sec.

Gaia Models - 0.54 - 0.56 sec.

Theia Models - 0.58 - 0.62 sec.

Power Draw GPUs Only - 234W each (468W both).

The speed also depends heavily on the footage youā€™re upscaling. You 480 => 1080 or 1080 => 4K orā€¦blah blah will give your different speed as well.

The value of sec/frame has little variance, so I suggest more intuitive comparing index, the total working time with a same sample file.

You will see the created time and modified time, if right click the output file and select Properties in windows.

The difference of the time can be comparing index.

Primary PC:
CPU: AMD Ryzen 9 5900X, IF set to 1600, PBO on, cooler is Noctua NH-D15
RAM: DDR4, 32 GB (2x16 GB), 3200 MHz, CL14, Dual Rank, Dual Channel
GPU: EVGA 3090 FTW3 Ultra
Motherboard: ASRock X470 Taichi Ultimate (would X570 with PCIe Gen 4 help my 3090?)
Driver version: 471.41
OS: Windows 10 Pro, Version 21H1, Build 19043.1165
VEAI Version: 2.4
Temps/utilization (typical sustained use, not from this test; Artemis model, 480p to 1080p): CPU ~59C/~60% utilization, GPU ~51C/~33% utilization

Results, taking the average from frames 100 to 200, with the established procedure for 200% upscale:
Gaia-HQ v5: 0.18 sec/frames
Gaia-CG v5: 0.18 sec/frames
Theia-Fidelity v4: 0.17 sec/frames
Artemis-HQ v11: 0.12 sec/frames
Artemis-AA v11: 0.12 sec/frames
Artemis-LQ v11: 0.12 sec/frames

Secondary Workstation:
CPU: Intel i7-5820K, overclocked to 4 GHz (can achieve 4.4 GHz with Dual Channel memory config), cooler is Cooler Master Hyper 212 EVO
RAM: DDR4, 32 GB (4x8GB), 3000 MHz, CL15, Dual Rank, Quad Channel
GPU: EVGA 1080 Ti FTW3 Hybrid
Driver version: 471.41
OS: Windows 10 Pro, Version 21H1, Build 19043.1165
VEAI Version: 2.4
Temps/utilization (typical sustained use, not from this test; Artemis model, 480p to 1080p): CPU ~58C/~50% utilization, GPU ~65C/~70% utilization

Results, taking the average from frames 100 to 200, with the established procedure for 200% upscale:
Gaia-HQ v5: 0.73 sec/frames
Gaia-CG v5: 0.73 sec/frames
Theia-Fidelity v4: 0.39 sec/frames
Artemis-HQ v11: 0.24 sec/frames
Artemis-AA v11: 0.24 sec/frames
Artemis-LQ v11: 0.24 sec/frames

Swapping the 1080 Ti from my primary to my secondary machine did see a slight improvement for single-instance work. But the overall lower utilization meant I could run more instances. There is some overhead cost for parallelizing the work for each individual instance, but itā€™s overall faster.

I also had upgraded from a 2700X to the 5900X in the primary machine, which did see the frames/sec increaseā€¦ but I do not recall what the baseline was.

The 3090 was typically running at 1800 MHz for this test; locking the voltage to 1093 mV (using MSI Afterburner) would result in a clock speed of 1930-1950 MHz. There was no noticeable change in sec/frame performance.

AMD 6800xt and 6900xt are even more fast than 3080 and 3090 in multiple simultaneous executing VEAI works.

Any numbers? Still thinking in getting a W6800 but nobody on this planet want to test them with ā€œactualā€ topaz software.

The only one who did test the W6800 with topaz software was Igors lab but with Gigapixel running on Cuda and OpenCL and not Direct ML.

Here is a Benchmark from DxO Photolab Deep Prime were a 6900XT is in front of a 3090 and a 6700XT is close to a 3080 and as fast as a 2080S.

https://docs.google.com/spreadsheets/d/1Yx-3n_8D3OreyVwQLA-RVqiQCAgNdXDkxScpXTur1Gs/edit?usp=sharing

1 Like

6900XT is a better option I can say.

VEAI has much more FP16 models than FP32.

And 6900xt shows performance 46.08 TFLOPS, whereas 3090 shows Performance 35.58 TFLOPS.

1.315 times better.

I have used ā€˜RTX3080 Eagle none O.Cā€™ before using ā€˜XFX RX6900xt Mercā€™.

My total VEAI working time has been much more shorted and I can add one more VEAI job after changing GPU.

This is good for only VEAI program and I still miss the Cuda and Tensor cores.

But now I only use VEAI, so Iā€™m very satisfied with 6900xt.

In addition, my RX6900xt consumes less power than RTX3080 does at 99% GPU load.

More works, but less power consumption in VEAI.

2 Likes

:rofl::rofl::rofl::rofl::rofl:

tough time is coming with CDNA 2

47.9 TFLOPS - FP 32 & FP 64
383 TFLOPS - FP 16

https://videocardz.com/newz/amd-instinct-mi250x-with-mcm-gpu-to-feature-110-compute-units-128gb-hbm2e-memory-and-500w-tdp

1 Like

I did some quick benchmarking to provide some numbers for my RX 5700 XT.

Full Specs:
AMD Ryzen 5950X
Sapphire RX 5700 XT Nitro+
4x8GB 3600 MHz CL16 DDR4 RAM
VEAI Version 2.4
Windows 10

Everything on stock settings except XMP enabled.

Results:
Gaia HQ v5: 0.30 s/frame
Gaia CG v5: 0.30 s/frame
Theia-Fidelity v4: 0.24 s/frame
Artemis HQ v11: 0.16 s/frame
Artemis AA v9: 0.16 s/frame
Artemis LQ v12: 0.16 s/frame

The variance during upscaling once stable was only Ā±0.01 s/frame.

1 Like

I wonder if ten years from now with hardware and software advances real time upscaling will be a thing. We will play a h.267 file or whatever the standard is then on our favorite video player and it will upscaled 480p to 4k in real time as we play it :heart_eyes::star_struck::smiling_face_with_three_hearts:

Hmmmm, the M1 Pro has it made to the 6th Place in the Deep Prime Benchmark.
And the RX 6700XT on the 5th.
ā€¦
ā€¦
ā€¦
Right be back again, i put my PC in to flames and then come back. :crazy_face:
ā€¦
ā€¦
Or is it Fake?

I am sitting here with my silent mac mini upscaling at 0.11 seconds per frame and my 3070laptop upscaling at 0.14 . One computer is silent the other sounds like a rocket ship blasting off to Mars lol. I know where my future lays in upscaling: Not Amd or Nvidia, but Apple.

1 Like

Some more times have been added and they are all over the place.

And here in the forum the people are comparing 500p to 1080p for the M1s and the GPU Guys are comparing 1080p to 4K ā€¦ :thinking:

1 Like