NVIDIA RTX 3090/A6000 vs AMD RX 6900 XT VEAI Benchmark

I don’t follow what you mean :slight_smile:

  • Why would I upscale a 2nd time? And if you mean a quality pass followed by a sizing pass, or vice-versa, the same applies for the single card, which is ~50% slower in both cases. Or 25% if the quality is run with Artemis while the upscale with the mGPU model.

  • Surge protector doesn’t shut the computer down. It regulates the flow. So a 1200W PSU is more than suitable for a 5950x + 2x 6800XT.

  • Where are you getting 20% time saving from? And it’s not 100% money at all, it’s +20% money for --90%-- time saving. And again surge protector avoids any risk of those surges you’re concerned about.

  • There is a larger power draw yes. But also for only half the time. So total power usage is essentially a wash. Yes anyone serious will want to buy decent components to protect against power surge damage. I think that goes without saying for -any- expensive build with heavy-duty components so in most cases this isn’t even an additional expense for mGPU.

I fully appreciate your vastly superior expertise here but I’m dubious regarding those numbers.

  • 1x 6800Xt performs about 5% less than a 6900XT, but at 2/3 the price. This is suitable for times when Artemis model is called for. On these occasions the 2nd GPU does nothing and therefore doesn’t draw [relevant] power either.
  • 2x 6800XT, on AI models that support it, get 95% x2 = 190% of the performance (& thus 47.5% time saving) for only 133% of the price.
    Electricity is not 100% increase because they are running for only ~ 55% of the time.

Like I said, you know this much better than me, I’m just a numbers guy. So where am I going wrong with the numbers?

I have read in the Infinity Cache paper that amd has noticed a performance increase from this cache when running AI models.

https://www.reddit.com/r/Amd/comments/j609v7/amd_infinity_cache_patent_and_white_paper_details/

3 Likes

“We show that due to the converged nature of GPUs and future DL scaling requirements, the GPU’s memory bandwidth will become the primary performance bottleneck for GPU-based DL training and inference, while being under-utilized for most HPC applications.”

https://dl.acm.org/doi/10.1145/3484505

See figure “2”

“However, at small batch sizes, SM under-utilization accounts for 41% of total execution time and serves as the primary performance bottleneck rather than DRAM bandwidth. This is because MLPerf small-batch inference does not expose enough parallelism to fill an entire GPU that was designed for the datacenter. Moreover, due to small-batch inference’s relatively small memory footprints, the majority of each workload’s data can be buffered on-chip, and thus DRAM bandwidth is not the primary bottleneck.”


I always find these papers interesting because you can see what new tools are coming to market, even if we never get that tool, you can see where the journey is going.

I assume that we have small batches.

1 Like

So I suggest that VEAI should automatically split a video into multiple parts, then process them at the same time. Depends on the video memory option, VEAI should be able to determine when to split and when not to, and how many parts it should split. That way, we could maximize the performance without having a big batch to do so.

3 Likes

I know that the photoapps allow you to type in that there are multiple tiles.

performance technically this has not changed anything.

up to a certain point, a larger tile increased the speed.

I’ll ask.

would mGPU work with two different GPUs?

I have a Radeon VII laying around, could I throw it in with my 6900xt, and get a boost with Proteus?

Yes. It will work.

1 Like

I appreciate you sharing your results here, but the A6000 and 3090 are not equal. There is a massive difference in memory capacity (favors the A6000) and memory bandwidth of 768 GB/s of memory bandwidth, which is 18% lower than the 3090 (936GB/s). The A6000 has 2.4% more cores.
The MSI Gaming Z Trio AMD RX 6900 XT also comes with a factory overclock, as opposed to the stock speed on the A6000.

The 48 GB of ram will not benefit running VEAI, but the difference in memory bandwidth and cores will. I try to simulate the difference with a 3080, the 18% decrease in memory bandwidth will affect the performance a lot more than the 2.4% more cores in the A6000. My assertion is that for VEAI, the 3090 should be faster than the A6000, and definitely will be faster than the A6000 if you are using a factory overclocked model.

Unfortunately, that is not the case in VEAI. Overclocking only helps 5-10% with upto 40% more power consumption. And when comparing A6000 and 3090, both of them are stock models, not reference cards. 6900XT is a reference card, but it out performs Nvidia cards because of the VEAI optimizations from the developers. I bet they are using W6800 or 6900XT also. If someone asks me which card I recommend for VEAI, the answer will be 6900XT. The differences between the cards are just too small. Only the optimizations will help VEAI run faster. Just like game benchmarks between Nvidia and AMD cards. And I can see that Topaz team focuses more on AMD and Apple silicon than Nvidia and Windows.

With most models, VEAI basically just upscales videos one frame at a time, without using any temporal data (ie data in the previous or next frame).
So an easy way to speed up the processing using multiple GPUs is to just have the program send the next frame to whichever GPU has finished its current frame first.
Of course you would need to have a buffer for the output frames to make sure they get encoded in the right order, rather than encoding them straight after they are done.

Something to keep in mind, I don’t know if it 100% matters.

HVEC or h265 and h264 are integer codecs.

For the RTX 3090 this means that it only has 20 teraflops here.

Which would explain why Apple can catch up with this GPU when it comes to benchmarks with these two codecs.

As written I do not know if this is 100% true.

Based on some recent reviews from Youtubers and internet reviewers, it seems that M1 Ultra is still way to far behind RTX 3090. If I’m not mistaken, It’s about as same as RTX 3050 mobile GPU which is not good enough. So CPU rendering is great, low power consumption is good. However, with the price point at $5000, it’s bad. Most benchmarks Apple shows are fake.

2 Likes

As I saw it, the M1 already has its niches where it can shine with performance.

1 Like

I just wrote AMD and Nvidia because there is no complete h265 support (for all color formats) from both manufacturers.

And that I do not understand them that they leave the field so easily to the other manufacturers.

There are so many people who just switch to Apple because they have full h265 support and you don’t need proxies or caches with optimized formats (You save an extremely large amount of time.) (in Davinci Resolve for example) to play h265 fluid.

It took me a whole afternoon to figure that out.

See link.

https://www.pugetsystems.com/labs/articles/What-H-264-and-H-265-Hardware-Decoding-is-Supported-in-DaVinci-Resolve-Studio-2122/

1 Like

I don’t use Davinci Resolve much to say about it, but I use Adobe Premiere Pro mainly. Much easier to do any things. I found that Davinci Resolve is just okay to meet my needs. Premiere Pro is much better and faster. But I might be wrong because I don’t use it as much as I thought I would. So I prefer Adobe over Blackmagicdesign for my workflow.

Premiere’s h265 support is the same.

It may be that it is less noticeable because Premiere rather uses the CPU.

https://www.pugetsystems.com/labs/articles/What-H-264-and-H-265-Hardware-Decoding-is-Supported-in-Premiere-Pro-2120/

Not sure if this would make sense for all of your benchmark setups, but I’m curious how these cards scale with multiple instances.
For example, I’m running an RTX 3080ti and I can easily have two even three instances of VEAI upscaling to 1080p with an Artemis model before my CPU maxes out. The seconds per frame takes a hit, but that’s what I want to know. Do these cards take the hit the same way?

It’s very identical when using multiple instances. The differences here is the GPU memory and memory frequency. Depends on which GPU you get, these will affect more or less. Faster memory frequency is more important than more GPU memory.

Non-cacheable workloads are always bandwidth dependent.

Cool. Thank you for answering this.