Ampere vs Navi 2

TPX · June 8, 2021, 7:57am

Hi,

I’m planning to get a new graphics card for my home workstation because my current one doesn’t have tensor cores and FP16 capable hardware, so I need some numbers.

Today comes the Radeon Pro W6800, this card is equivalent to a RX 6800 and performance wise it will be between the RX 6800 and the RX 6800 XT, unfortunately AMD “only” offers FP16 hardware and no Tensor cores.

The other model or models would be the RTX A4000 & RTX A5000, both models are very fast as they have tensor cores.

I can estimate the performance of the Nvidia models, but not the Radeon Pro.

Does anyone have perfomance numbers with the RX 6800 or 6800 XT from Denoise, Gigapixel & Video AI created with the current engine so that I can make comparisons?

TPX · June 8, 2021, 2:29pm

AMD own numbers shows the W6800 55% (Average) in front of a W5700 (RX 5700XT based but slower) in VEAI (2.0.0) (4K upres).

But i don’t know how a W5700 or a RX 6800 compares to a 2080S or 3070.

viktorz3008 · June 8, 2021, 2:57pm

Since VEAI uses tensor cores, I believe Nvidia GPUs are better than AMD. Especially in Machine Learning like this. AMD GPUs don’t have tensor cores, and also bad FP16 performance. Therefore, don’t waste money on AMD for VEAI.

lhkjacky · June 8, 2021, 8:10pm

FYI,
Radeon RX 6900 XT: FP16 performance: 46.08 TFLOPS (2:1)
Radeon RX 6800 XT: FP16 performance: 41.47 TFLOPS (2:1)
Radeon Pro W6800: FP16 performance: 38.51 TFLOPS (2:1)
GeForce RTX 3090: FP16 performance: 35.58 TFLOPS (1:1)
GeForce RTX 3080 Ti: FP16 performance: 34.10 TFLOPS (1:1)
Radeon RX 6800: FP16 performance: 32.33 TFLOPS (2:1)
GeForce RTX 3080: FP16 performance: 29.77 TFLOPS (1:1)

TPX · June 8, 2021, 8:19pm

In November 2020 a seller did a test for me with a W5700 and it matched almost a RTX 6000 in performance although it is actually much slower on paper (with Denoise AI).

But, i found a slide.

https://pics.computerbase.de/9/9/0/2/5-a648c590b431bc58/17-1080.12d94b94.png

Seems like the Radeon Pro VII is as fast in VEAI as the W6800 (11% faster) but both are a little bit faster than the RTX 5000.

Now I would need to know if the Navi based cards can handle GPU multitasking well so that you can use multiple programs at the same time.

I would have taken the Radeon Pro VII in November but it stutters in GPU multitasking because it prefers compute and not graphics.

In Capture One, the Radeon PRO VII is 50% faster than an RTX 5000 and just as fast as an RTX 3090, which comes from the memory bandwidth (probably) or the PCI-E 4.

viktorz3008 · June 8, 2021, 8:57pm

Just numbers on paper anyway. No real performance from that.

lhkjacky · June 8, 2021, 10:18pm

Same as the Tensor Cores
VEAI claim to support Tensor Cores since v1.7.0, but it is only on paper, because some users have pointed out that they see no performance gain from the Tensor Cores. VEAI may either not using it or not fully optimize yet.

For example, some user show that the process speed of a Radeon Vega 64 (25.33 TFLOPS without Tensor Cores) is similar to a RTX 2080 S (22.30 TFLOPS with Tensor Cores).

Don’t get me wrong, I am hoping VEAI can further improve it performance and better optimize RTX3000 series GPU & Tensor Cores to increase processing speed. But at the moment, it seem not fully optimize yet.

It would be great if Topaz can provide some official benchmark on GPU, so it can help users to choose the best optimized GPU for VEAI.

viktorz3008 · June 8, 2021, 10:46pm

At least a $300k car should not be driven by a 12 year old kid.
A $1000+ graphics card should be able to run with a mature driver.
Since Nvidia has CuDNN which is very well-known and AMD has RocM which is… “what is that”. Tensor cores are meant to boost the performance of machine learning workload. The reason why you don’t see much improvement is the models. If you want to see real improvement, use Gaia models. Artemis models use more CPU than GPU. Gaia models take advantages of the GPU seriously. It’s not because of Topaz team, it’s more like about the AI models only. But the new customization model might need more GPU than ever. And AMD GPUs are not suitable for ML. At least for now.

lhkjacky · June 9, 2021, 5:32am

Yes, I agree. Training AI model is better on Nvidia GPU than AMD GPU, but we are end users, we do not use the GPU to train a new AI model.

If I remember correctly, VEAI has changed from “TensorFlow, CUDA & cnDNN” to “ONNX Runtime and DirectML” since V1.7.0 to add support to AMD GPU.
DirectML-based ONNX Runtime support both Nvidia GPU & AMD GPU.

For example, here is a graph from “Landmark Tracker AI”.
1_RWJtV-SXLRDj1xkCFw6WRA
Landmark Tracker used to use “TensorFlow, CUDA & cnDNN” the latency was 8ms per frame (1080Ti), but they switched to “ONNX Runtime and DirectML” to add support to AMD GPU, after the change latency has increased to 10ms, and a slower AMD RX5700 get shorter latency 9ms per frame.

VEAI may work differently, but saying Nvidia GPU is better than AMD GPU may not always be true, when they are both using ONNX Runtime with DirectML.

As I pointed out before, it would be great if VEAI can officially provide us some benchmark on GPU, so we don’t have to guess the performance.

TPX · June 9, 2021, 7:23am

oh that’s why many people with older GPUs have noticed a slowdown, because of the latency.

And if you block the download of the models you are at the old speed, the models are still available in TF.

collinstrevor25 · June 11, 2021, 5:31pm

Your chart doesn’t really say anything, since I don’t think 1080ti didn’t had Tensor cores. I can say that switching from a 1080 (not TI) to a 3060ti reduced average processing time from 1.4 seconds/frame to 0.4 seconds/frame. Of course, that’s also upgrading from gen 2 CUDA to gen 3 CUDA, so that has some effect as well (not to mention faster VRAM and wider memory buss).

YMMV.

TPX · June 11, 2021, 7:02pm

The Pascal Gen Gaming has not a Single Tensor Core and can’t also calc FP16.
(Only the GP100 had tensors and FP16)

The Speedup from you 3060ti comes from the FP16 & Tensor Cores.