Here are GP, DN, S & TVAI Benchmarks

Puget Systems did post a whole set of TL Software Benchmarks.

This benchmark is considered outdated.
However, I assume that it is still similar.


4090 is overall 11% faster than 6900XT

As pointed out in the last weeks, for TVAI, CPU is most important.

Too bad they didn’t test tiff or jpeg files with the photoapps.

Thanks a lot.The benchmark is very impressive though the version of video AI is not up to date.

It seem that RTX 40 series card are much faster in “4x slow motion”, which make the GeoMean Frame/sec looks good .

It is a bit strange that, for Deinterlace, 4090 is the same speed as 3060 ? :thinking:

For upscale 1080p to 4k, 4090 is only 10% faster than 3070 and 20% faster than 3060 ? :face_with_peeking_eye:

I have also made the experience with puget that you should read the individual values.

As with other websites, a single high value can make one product look better than another.

It may be that RDNA 3 catches up with the 4090, or it will.

And that the drivers, when optimized, will still get a lot out of it.

RDNA 3 can do multithreating, so to speak and this also means that it is good at handling paralell tasks.

Yes, the GPUs are already fast enough for everything we do, you have to approach programming differently now, you can’t do it sequentially.

Only then we will see how much the GPUs differ.

A user on the forum switched from a 5800X to a 7950X and is now 40% faster without switching GPU.

There is a 7990 XTX placeholder.

88 Tflop Fp32
176 Tflop Fp16

Always these rumors, with rumors you can harm companies very much.
That is why there is NDA.

Stream HPC has spoken up and I personally have a very good opinion of them.

“DON’T JUMP TO CONCLUSIONS! It is quite normal for A0 to be shared among partners. And apparently they thought it was a good idea to patch it for everyone. IMO this is just useless talk.”

This means that what the twitterer found is nothing more than “if chip A0 is in the system, there is no prefetch for it”. This does not mean that N31 is an A0 chip.

When I look at the structure of the slides from the reviews I could imagine that they have to set the sheduler correctly, since this distributes the work to the compute units.

A compute unit pair has 4 sheduler?

Or do I see this wrong and here four compute units are shown.

Well, in any case, I think that sheduling is difficult with a core that can do two things at once, because there is a big risk that for some reason only one thing is done instead of two.

And the performance uplift that was in the rumors refers with the 2.7x to the matrix multiplications, not to gaming.

Let’s say in theory (I’m no expert) you could still get about 30% out by adjusting the sheduler, in the best case.

Rumor has it that N31 was shipped with Beta silicon.

It is unclear how this will affect computing.

But AMD must always have bad luck, you could almost assume sabotage, I mean at the time there were similar problems with the Vega architecture.

One performance aspect which doesn’t seem to get enough attention is RAM speed.

I am currently in the process of setting up my new system, moving from a 5950X DDR4 to a 7950X DDR5 system (combined with a 4090). In the process of optimizing my RAM speed (OC RAM) I noticed that it makes a big difference for TVEAI (at least for Gaia and Artemis). Doubling the RAM speed (from 3200 to 6400, both DDR5) increases Gaia and Artemis speed by almost 50%. CPU and GPU were kept at the same speed.

I also noticed that there is some initialization problem in v3.07 (and other versions most likely too) with Gaia. I do the same simple steps of launching TVEAI, loading a video (the 720p test video from this forum) and pressing the export button (Gaia 2x upscale with NVIDIA h265 encoding is auto-selected). About half the time, performance is about 7fps and the other half it’s 12fps. And I can assure you that there is essentially nothing else running on the system.


In fact, it comes up short.

And what I’ve seen on my Threadripper is that the memory channel count isn’t that important, as I’ve only seen a load around 4 GB per second.

I.e. in “theory” only one ram bar would be enough and you could overclock it to the limit if you really want to.

How that changes after the performance optimization can not be said yet.