OK, the weakest link is actually the PCI-E.
Two RTX 5000 are a little bit faster.
Here we have 96 SMs (Cores) (2x RTX 5000) vs 82 SMs (1x RTX 3090)
Both RTX 5000 have PCI-E 3x16, 3090 has PCI-E 4x16 = 32 GB/s.
As Nvidia presented yesterday, the transfer rate between CPU and GPU is a problem, which is why they designed GRACE (600GB/s CPU to GPU, but we will never get hands on this).
We can hardly influence the transfer rate between CPU and GPU.
If we now increase the performance of the other components, it does not bring as much as if we would increase the speed of the PCI-E.
The other possibility would be to increase the tile size, which again needs a lot of Vram (something you can see in raytracing) and a training of TL on the tile size which you then execute.
A Larger tile size shifts the performance back towards the GPU.
For example, a 2048px tile requires between 10 and 14 GB of Vram.
I think at the moment we can only wait and see.
__
__
And my theory for the tiles size is.
GPU cores count (not SMs) divided by two, then you have the resolution and each core takes care of one pixel.
3090 = 10496 Core / 2 = 5248px x 5248px
RTX 5000 = 3072 Cores / 2 = 1513px x 1513px
As i said its a theory of mine.