Yes, the performance will be worse than the real machine due to some limitations from the VMs.
Nvidia has been hacked.
I am highly enthusiastic about it (not), do not download any software from anywhere else that supposedly increases performance.
It is not even sure that the next drivers are free of malware.
Better wait a bit with the next updates.
Yeah, I’m gonna sit the next round of updates out.
Apple M1 ULTRA steps in.
Its GPU should have 20 Teraflops (theory).
I’m still in search for real world numbers for M1 Max.
What does it look like for VEAI?
It is my understanding that the neural engine is primarily being used for VEAI on M1 devices. Although the GPU on the new M1 ultra chip is impressive, isn’t the updated neural engine (now 32-cores and 22 trillion operations per second instead of 11 trillion) the most relevant factor in this instance? I have no knowledge about this, so feel free to correct me if I’m wrong. If this, however, is the case we could hopefully look at double the processing speed. At least theoretically
I haven’t peeked in here for a long while now and was reminded this thread exists after I got a notification about changes made by a few users whom I’ve given editing access to.
I think now is a good time to revamp the table.
I’m thinking about utilizing the Sheets mechanic for each version of VEAI so that it would help a lot with organization. It could also contain a download archive link if one exists.
I thought about changing the source video to 1080p but this is really meant to look at performance and is not trying to create a VRAM bottleneck. But if the community thinks otherwise then I’ll go ahead with it.
But given the current Beta I may have to add new video source types when it releases. Though I’m sure those would be more CPU bound.
The main selling point for me is upscaling home movies and DVDs to 1080p. I would be less interested in the results of a 1080p source video.
@shikuteshi Wanted to start by saying thank you for having set up and maintained this table for so long now.
Hope it’s alright that I’ve started filling out the v3.0.0 beta/alpha sheet with some of my benchmark results. I’ve also added a section for running two sessions at once since between these forums and the upscale subreddits I’ve noticed that’s a very popular workflow for people looking to cut down on time.
I’ve noticed that running multiple sessions is FAR better on VEAI 3.0.0. Whereas before the gains were marginal they’re now quite significant.
Model: Gaia Hq v6
CPU: Ryzen 3950x
GPU: Rx 6700 xt
Concurrent Sessions: 2
FPS Per Session: 2.2
Effective FPS: 4.4
Effective Seconds Per Frame: 0.22
Note that this benchmark seems to perform near identically to Gaia V5 on 2.6.* versions.
In the GUI versions 2.6.* due to how priority and window-focus messed with things two sessions would yield results closer to “a 6700xt and a little extra” and was rarely worth splitting and re-combining clips. In version 3.0.0 it’s getting results that beat what a single-run on a higher-end Rtx 3080 can do. I anticipate multi-session setups to be extremely popular (at least for for AMD users) in version 3.0.0 as this difference is massive.
As someone that does exactly this I can tell you that the results you’re seeing on the spreadsheet, at least for me, scale fairly well down with resolution. These numbers are likely very relevant for you.
What changes is the VRAM usage. On this benchmark I can run two concurrent sessions and see a decent boost while on my home movies (some messy 240i) I can run three or more. As mentioned above concurrent sessions will probably become a more important part of the discussion with the return of the CLI and the overhaul in version 3.0.0.
Thanks for clearing that up for me.
I would like to add my results.
CPU: Ryzen 9 5900X
RAM: 32GB @ 3200MHz
GPU: RTX 3080 ti
Gaia CG = .21 sec / frame (Same time for HQ)
Theia Fidelity .20 sec / frame
Artemis HQ v12 .17 sec / frame
(Same time on all Artemis models)
Chronos v3 .06 sec / frame
Most valid RTX 4XXX GPU Numbers i’ve seen so far.
I thought the 4090 would end up at 50 teraflops.
Note that Lovelance will scale much better to the raw numbers than Ampere, if it is true that int32 can be computed in parallel and if Lovelance inherits the parallelism from Hopper.
Here is Navi III, seems to be a monster @ FP16
I want to add some Numbers for an M1 Max MacBook Pro 14" (2021)
CPU/GPU: M1 Max
GaiaHQ v5: 0.58
GaiaCG v5: 0.54
ArtemisHQ v12: 0.14
ArtemisAA v10: 0.14
ArtemisLQ v13: 0.15
Chronos v3: 0.11
Theia Fidelity v4: 0.36
Another Idea. Maybe we could track power consumption while in the Process, I think it would be a very interesting comparison to get a performance per watt number too. Edit The M1 Max always draws 45 Watts for all the mentioned numbers (Looked up with asitop)
Here are my results using the sample clip:
CPU: Intel i7-9700
RAM: DDR4, 16GB, 3200MHZ
GPU: GeForce GTX 1650, 4GB VRAM
OS: Win10 Pro
VEAI Version: 2.6.4
Gaia-CG v5 (3.13 s/frame)
Gaia-HQ v5 (3.14 s/frame)
Artemis-MQ v13 (0.99 s/frame)
Proteus-FT v3 (1.03 s/frame)
CPU: Intel i7-11700K
RAM: DDR4, 32GB, 3200MHZ
GPU: PNY GeForce GTX 1070 ti, 8GB VRAM
OS: Win10 Pro
VEAI Version: 2.6.4
Gaia-CG v5 (1.17 s/frame)
Gaia-HQ v5 (1.17 s/frame)
Artemis-MQ v13 (0.41 s/frame)
Proteus-FT v3 (0.44 s/frame)
I don’t know if consumption is that important when you have to wait 2.5x as long (Model dependent).
After all, the slower device also consumes 2.5x as much power in this timeframe, maybe you won’t be so far apart.
if i’m not wrong.
Has anyone actually tested a 5800X3D, I would be very interested in how VEAI reacts to the larger cache.
Seems like the 4080 will be 37% faster in FP16 as RX 6800 so no chance that i will switch from AMD to Nv again.
RX 7800 (rumor) will get 85 Teraflop FP16 wich is a way faster than RTX 4080 (48 Teraflop) and as fast as 4090 (85 Teraflop).
RX 7900 will have 132TF FP16, woo hoo.
All numbers still a rumor, but the Techpowerup Database was close the whole time.
The RTX Prices are just insane.
Just in case some didn’t watched Nvidia product launch, every one should be careful and not fall into Nvidia Trap.
Nvidia have released two RTX 4080 card, one with 16GB VRAM and the other one is 12GB VRAM. Don’t mislead by Nvidia, they are completely different card.
The 16 GB variant is using AD103 GPU, while the 12 GB variant is using AD104 GPU (which should be called 4070).
The 16 GB variant have 9728 shading units (49 TFLOPs), while the 12 GB variant only have 7680 shading units (40 TFLOPs).
The 16 GB variant have 256 bit memory bus (735.7 GB/s bandwidth), while the 12 GB variant only have 192 bit memory bus (503.8 GB/s bandwidth).
The 4080 12GB is selling at $900 USD, while 4080 16GB is $1200 USD. Nvidia just don’t want people complaint they sell a 4070 at US$900, that is why they renamed it as 4080.
This is a very good point. Given this suspicious naming of that 4080 12 GB, I am waiting on the RDNA 3 launch with much interest. And with the allegedly poor practice of Nvidia in treating those they partner with (EVGA), this really is the straw that broke the camel’s back for me.
I have always assumed that the cards can run int32 in parallel with fp32.
Like now it is an Ampere 2.0.
It seems that the theoretical power will not be used this time too.
But it’s wait and see, because nothing is known except rough gaming specifications.
It may also turn out that the cards are extremely fast in compute, which you might expect based on the power consumption.
The 4080 is indeed very annoying.
Techpowerup lists the RDNA3 GPUs with PCI-E 5.
Lovelance is PCI-E 4.
RDNA3 will have 128GB sec pci-e bandwidth, PCI-E 4 is 64GB.
This should have a big impact on compute too.
I’m no expert, but that would only be if you have something like Smart Access Memory enabled and VEAI uses the slower system RAM instead of the faster GPU RAM, right?
Mayby someone has done GPU PCIe benchmarks on gaming and rendering workloads, but I have not found them. I just remember some statement back when PCIe 4 came out about GPUs not even using half of the bandwidth of PCIe 3… I should probably look that up.