The perfect VEAI rig?

Hi Topaz developers,

I’m wondering: according to you, what’s the absolute top-notch, very best rig for VEAI (using Proteus) you can dream of? The fastest machine you could imagine? Would a Nvidia Tesla A100 be of any help, or would like 6x3090 help, or 4x A6000, or whatever? Does a big CPU do any difference?

Here is our problem: we work in Virtual Production. We love VEAI but we only work with 8K, 50fps, Prores4444 plates (equirectangular, 360-degree VR-like stuff). And each take is like 5min long. The purpose is to denoise/sharpen/enhance alrerady nice footage (coming from Red Heliums 8K or Sony Venices 6K)
On our current rigs, we spend about 22 hours to denoise one 8K/50fps plate. Unfortunately, this is far too slow, we really would like to be like 10 times faster… without buying 10 new workstations, if possible!

We’ve tested the following rigs:

“Neo” is a 24 cores/48 threads 3960X AMD Threadripper with 128gb RAM; RAID0 Nvme; one RTX6000 plus one RTX4000. We use Proteus on two or three VEAI instances, with “Use all GPU”. For now Neo is by far our fastest rig.

“Morpheus” is a 24 cores/48threadripper 3960X AMD Threadripper with 128gb RAM; RAID0 Nvme; but only one RTX6000; so we do only one VEAI instance.

“Merovingian” is a 12cores/24threads AMD Ryzen9 5950X with 128gb RAM, simple nvme, and a single RTX3090. We do only one VEAI instace. Strangely, despite its faster and newer RTX 3090 Merovingian is slower than Neo and more or less on par with Morpheus (which is satisfying for a Matrix fan, but hey).

“Tank” is a 8cores/16threads AMD Ryzen7 5700X with 128gbRAM, simple nvme, and two Geforce 3060. We run two instances of VEAI with “use all GPU”. Performance is more or less on par with Morpheus or Merovingian.

What do we do wrong, or what can be done better? Should we use a crypto mining rig (btu I’m afraid the PCI-E x1 connections won’t do)? Or buy some huge Supermicro motherboard and cram 4 or more 3090 on it? Or A6000s?

Or would it be better to buy more budget-friendly rigs like Tank, with multiple “small” GPUs; rather than one big crazy rig?

What would be the best/more optimized config for VEAI? …
Many thanks

1 Like

And would a cryptocurrency mining rig make any sens? Which means: lots of GPU, but only PCI-E x1?
And how a small CPU and only 16gb of RAM (typical of this kind of rigs) impact the performances?

you should consider AMD Threadripper Pro 3995wx with 4 x A6000 or 4 x 3090 with Nvlink .

Do you think that such a big PC would be better than a cryptocurrecny rig (small CPU, tons of GPUs but PCI-E x1 ?) ?

I am using Fractal Meshify 2 XL without problem. You should also consider Phanteks T30 fans as well. I have 8xT30 fans in my case and max tempature of GPUs is below 80 Degrees.

I was talking about the rig itself, not the case. I’m wondering if having lots of GPUs but a small CPU and only PCI-E x1 lanes makes sense vs. only one or two GPUs but with PCI-E x16 lanes and a big CPU.

i would say you failed your testings before: On my rig I have 3900x with 12 cores, 64gb ram and 3090 and i am using 3 instances that eat around 85-90% cpu, 40GB RAM and 3090 max 50% usage. so my bottleneck is cpu but with your specs for neo i expect to run like 8 instances and 6 for the morpheus setup until you run into gpu vram limitation

you should not only go for mining rigs with multiple GPUs. you should have a look at task manager and see what is heavily in use when you run your tests. my advise is to get as many instances running as possible on one machine. since one instance uses around 30% cpu on my ryzen I suggest 2-3 cores per instance generally. 12-15GB RAM per instance when I use proteus and 6k input → 8k output settings. 3 instances eat up around 50% of my gpu vram so its 12GB / 3 = 4GB per instance.

Very intersting. We’re in production now so I can’t do any more tests, but I will, next week. Thanks!

The bottleneck is memory bandwidth. I remember 3900x only have around 50gb/s memory bandwidth.

Well thats interesting to me: What would you consider as a good cpu update then? 16 core ryzen 5000 or is it enough to step up to 8/12 core 5000 ryzen @James.L

I have 5950x as well but the menory bandwidth is only around 50Gb/s. 5950x cannot fully utitize the 3090 and even cannot fully utitze 3080ti. AMD Threadipper Pro has 200gb/s memory bandwidth. For example 5950x + 3090 can only open 4 instances for upsacle 720p to 2k but 3975wx can open 10 instances with same graphic card. Also I can run 10 instances with Handbrake at the same time with 3975wx.

I keep hearing people talk about more than one instance. Can you please help me understand what that means and what the benefits are and how that is implemented? I’m running with an Octocore Intel i7 processor 10th gen, 64gb or 3600mhz memory and an Nvidia 3070Ti. Thanks.

it means: open topaz a second time. then you let the 1st instance render the 1st half of the video and the 2nd one you set to the last half. you can then potentially be twice as fast and have to set the whole video together with editor of your choice in the end.

open task manager and have an eye on your performance tab to see how many instances you are able to open until one of your hardware tops out at max usage (CPU, RAM, HDD, GPU. For last one, check the details on GPU because normalyly the usage is not the 3d but the graphics_1 statistic)

1 Like

I appreciate it! I wasn’t aware that you could open the program multiple times at the same time. This will be something fun to experiment with.

I have a Threadripper 3960X with a 3090 and 2x 3070s.
Proteus is more CPU heavy than say GAIA, so you can run more instances to saturate the CPU without putting to much load onto your GPUs.

Short answer for a ideal Proteus rig imo is a 3970X, + 3x RTX 3090s/ A5000s ( if you have money). Could run 4-6 Topaz instances at a guess.

As you working on some heavy duty video at high resolutions, I think you did the right thing working on Threadrippers.
I would think you need to take some time out to benchmark your rigs and see what they can do. Load up Task manager to monitor your CPU load and get the usage up to 90% by opening multiple instances of Topaz. You also need to monitor your GPU loads.
Use an app like GPU-Z to monitor your GPU usage. Really you should be ensuring that you can increase the number of Topaz instances running on each GPU until your GPU hits 100% consistently and your power is consistent to 100% of its TDP.
Threadrippers are great because of the high core count but you need to load up instances of whatever software you use unless you’re using a software like Blender and other CPU renderers that scales well with core count.
I think 3090s are your best bet GPU wise in terms of performance/cost. If you find you still have more CPU usage to spare, add a GPU to Threadripper or Ryzen rigs. I can’t recommend mining rigs, because they’ll ususally be paired with a very weak CPU and limited IO, because mining requires virtually no CPU load. Topaz performance would be terrible.

I don’t think there is anything on the market that will speed up your workflow in the PC world. An Intel 12900K has the fastest single core speed you can get at the moment, but I doubt that will make much difference to Topaz. Plus when the cores are fully saturated it would be slower than a 5950X. All you can do is add 3/4 GPUs per rig.
If you want to go nuts get a Threadripper PRO board and a 3995WX and add as many GPUs as your power supply can handle.

1 Like

Thanks for that very detailed answer!

Don’t forget some crazy expensive SSDs. You’ll need some 8TB drives that were made to have constant writes. I have a 4TB HDD that’s fine until the cache fills up, then it slows the whole thing down from 0.08 sec per frame to 0.5—and that’s just running one instance with tif output. Maybe that’s not an issue with pro res output.

Thanks for the advice. Never had a problem with that with Prores - but we use either internal NVMEs in RAID0 mode (so speed is always ok) or external ARECA RAID5 towers (6x 18to HDD drives) on a 10gb LAN. That’s not as fast as the nvme, but seems to be still OK: the LAN is far from being statured, even with multiple computers writing on the ARECA.

So I did a new test with “NEO”, with two RTX3090 plugged in instead of its usual Quadro RTX6000+ Quadro RTX4000 configuration. Since Neo use a Threadripper (so 100GB/s memory bandwidth) and that, on paper, the 3090 are much faster than the older Quadros, I was hoping to get astonishing speeds. But the result was just “mmm’kay”… Quite disappoiting:
From a total of 2,1s/frame with the Quadro on 2 instances (so 4,2s/frame on each instance), I get, at the very best (I even tried to overclock the two 3090), 1,8s/frame. Running more than 2-3 instances gives only marginally better results (I sometimes get a total of 1,7s/frame, but not consistently). More than 4-5 instances do not get any better result at all.
I use GPU-z and I monitor my CPU usage: I never get better than about 50% GPU usage on each GPU. At 5 instances, I’m using 100% of the CPU; at 2 instances, about 70%. I can’t get the full usage of those 3090s, that’s frustrating.

So, the insane price tag of a rig like Neo is not worth it for me. Buying two smaller rigs (like “Tank”: Ryzen 7 with two 3060), for almost the same price tag, is faster and more reliable (if one rig fails, the other continues his work).
Plus, hey, you get two quite capable computers, which is always nice.
We’re in the process of expanding the render farm. I’m considering buying an Intel rig, that time, with a i7 12600K, DDR5 RAM and still two 3060. Since the i7 memory bandwidth is 80GB/s, I’m hopiong to get better result than with Tank, for a comparable price tag. That rig will be called “Dozer” (Tank’s older brother in the first Matrix movie… ). I will keep you updated.

I’d stay clear of 3060s for VEAI. I thought they would be similar to 3080s but they’re not even in the same league. They generally take twice the amount of time, sometimes more.

I used to be a crypto miner with bunches of GPU and thought i could put a maybe 2-3 GPU rig for VEAI…until after many testing. I found the best way is to build multiple base-cheap high performance PCs with single GPU installed. Something like 3 of i7-12700F + RTX 3090 rendering one project split in three then combine it later. 3x faster than 1GPU (sure) and 2.5x faster than 2GPU on one rig(here we go). I don’t see multi-gpu on one motherboard can multiply performance anytime soon. We talking about how AI changing the world and we are still stuck with this stupid circumstance. lol