Next step towards finishing projects faster

fk1 · June 13, 2022, 11:51am

Hi guys,

I use VEAI for almost 2 years now, I mainly polish/upscale early days VR material. My current machine:

Ryzen 9 3900X
64GB DDR4 3200MHz
RTX 3090
Custom watercooled loop
2x NVME PCIe 3.0 SSB 1TB
4x2TB HDDs on RAID0 making a total of 8TB

In general I open 3-4 instances of VEAI to maximize load, that mostly hits on the CPU loads, depending on the model I use. I regularly use Artemis HQ/MQ v12 but since Proteus is out I almost always choose that with some slightly bumped custom settings, simply the best unless the source material is not dramatically bad compressed.

Now I get new projects with higher resolution (6k) and MPBS (120) that needs to be polished and upscaled to 8K. I know, it is not that much but still there is some room for improvement. these footage is recorded some years ago on older tech so VEAI can still enhance the footage. However, such high res source footage bumps the time to process again:

I currently run 3 instances of veai with proteus v2 on the file. I hit about 9secs/frame for 3 parallel instances. GPU core load is around 40%. CPU core load around 85%. GPU VRAM usage: 55%

I wonder how to speed up: my output until today was 8bit png. might change this because 10bit is a thing but will have to look for more hdd space since this will get huge. I wonder if swapping the cpu to a ryzen with 16 cores is a thing to fire up 1-2 more instances? Or maybe some of you know when topaz will bring out a version that maximizes load on the gpu? should I consider swapping rtx to 6900xt? Whats your suggestions?

Best regards

kingdom-3809 · June 13, 2022, 3:57pm

Try option “Max AI Processor Memory Usage” = lowest … and “Reduce Machine Load” = ON.
I use two instances and two 3080 ti. The speed increases significantly with this setting.

James.L · June 13, 2022, 4:08pm

maybe you should add one more 3090 with Nvlink. My current project is upscale 16 x 1 hr video from 720p 60fps videos to 2K 60 fps (Artemis Medium Quality) everyday. Before I added one more card to my workstation, 9 instances hit about 0.37-39 frame/Second. Now I use 2 cards with Nvlink. Now 9 instances hit around 0.27-0.28 frame/second.

fk1 · June 13, 2022, 5:16pm

shouldn’t it be the other way around to get the fastest processing speeds? will have a try though, ty

fk1 · June 13, 2022, 5:16pm

yes, another 3090 should fire things up at least in some way but that is the ultimate cash-is-no-topic way to do it

fk1 · June 18, 2022, 2:27pm

According to this sheet:

the rx6900xt is 300% faster on artemis compared t my 3090. Is this ledgit? can someone confirm?? If this is so, should I switch or is it like there will come a patch in a few months that makes the 3090 as fast as 6900xt? cant really believe this but if thisi is true i need to switch gpu

ForSerious · June 18, 2022, 5:21pm

I tried this with only one 3080ti and 3 instances of VEAI running at the same time. I saw no speed change vs changing those options to max memory and not reduce CPU load , but it did seem to use less power at the same speed.

zachary.franklin · June 18, 2022, 11:51pm

I’m on here trying to learn the same thing…I have 5950x and 3090… I want CPU processing instead of GPU. I can’t get higher than 50% usage. I’ll be done in two weeks… great.

ForSerious · June 19, 2022, 1:52am

Yeah GPUs are just better and faster at this sort of AI application. Even if you got it to use 100% CPU, it would still take one week.
I’ve heard rumors that VEAI 3.0 will have some RTX 30 series optimization or utilization improvements, but nothing confirmed. Might just be wishful talk.

fk1 · June 27, 2022, 5:10pm

OK guys, I found the info that artemis profits from higher RAM and as I have some room for improvement on overclocking my amsung B Die RAMs from 3200MHz to 3600MHz I get a nice boost of around 15% speeds.

I learned a lot about RAM overclocking in the past days. I can even go further in MHz but since I hav 4x16GB RAM because of the 4 DIMMs I cant achieve more than 3600MHz yet. I tried with only 2 sticks and can go higher, probably around 3800Mhz and now I am thinking about wether it will make sense to temporarily remove 2 DIMMs until I change my sticks to 2x32Gb if that is even possible atm.

Anybody knows wether 32GB of RAM is enough to get the maximum speed out of VEAI? Or recommended to stay on 4x16GB?

in other words: 2x16GB 3800MHz vs 4x16GB 3600MHz which should upscale artemis faster?

sunday.weaver · June 28, 2022, 6:14pm

The capacity of RAM shouldn’t make a large impact on the speeds, but generally the Frequency plus timings greatly benefit CPUs, especially Ryzen ones. Something I would be interested in is if your workload speeds increase by switching to a 5800X3D, and see how the tradeoff of larger cache (reduces memory latency impact by a good amount in some applications/workloads) versus cores versus your current 3900X.

Another issue/thing I am wondering is if you’re simply bandwidth limited on the PCIE side. I am not sure of what motherboard you are using, but I know B400/B500 series boards have bifurcation which reduces PCIE 4 x16 to the GPU to 2x 8 and even 4x 4 (4 gpus, at 4 PCIE 4 lanes, equally 8X PCIE 3 speeds which are kind of slow when you go into demanding AI processing).

If you’re not limited on funds, or have a good way to return a CPU, try seeing how a 5800X3D goes, or even an Zen 2 Threadripper (3970WX?) to get a huge amount of PCIE 4 lanes, all at x16 speeds per GPU. That, paired with 8 channel RAM at 3200MHz with alright timings (CL14 or lower) should give you the bandwidth needed to increase GPU and CPU utilization at 4K/8K video upscaling if your NVMEs aren’t saturated by delivering frames to 3 GPUs.

For your current situation, tuned 3600MHz > 3800MHz XMP. In gaming and some applications, tuned 3200MHz can be competitive with 3600/3800Mhz in getting the highest FPS, which is of course tied to how fast/low latency data is moved off your drive, to RAM (4-6GB/s Disk → 50GB/s RAM → 32GB/s PCIE bus) and then GPU.

I’ve seen a person with a 3900X and RTX 3060 and similar or slower RAM process more frames a second for 480p → 1080p than my 5800X + 3070TI on PCIE4 x16, which to me is confusing as either my setup is wrong and underperforming, or their increased amount of CPU cores at the cost of singlecore perf (lower avg freq.) balances in their favor.

As a note, please ensure your RAM has good cooling, as once you start tuning it temperature changes can really mess with stability in weird and annoying ways. Take it from me who originally had 3600MHz CL18 get turned to 3800MHz CL16, but had to dial things back when adding a new GPU, enabling PCIE-4 because on my 5800X the added bandwidth/data lead to my I/O chiplet becoming unstable and yeah bluescreen city until I retuned my system.

Edit:
I’ve got an awful idea, which is to enable a Ramdisk on my GPU as I already have/had some software on it. I’m going to see what happens to my speed if I write to the GPU for both the source and output file. Will there be any benefit from skipping my NVME and keeping everything on the same device, or will the software cause issues and not give much to speed because it still has to deal with ‘slow’ RAM and CPU lol

fk1 · June 30, 2022, 10:05am

@sunday.weaver thanks for yor reply. I have a B550 but I remember reading on this forum pci-e bandwidth should not make a big deal in terms of topaz performance.

I bumped my RAM from 3200MHz XMP to 3600XMP 16-16-16-32 and optimized timings now. took some time but performance gain is 15%. Note that this is pretty much the same increase as MHz, so in the end it is MHz that counts.

I also testes using only 2 sticks with 3800MHz quick settings but since my VEAI eats 10-12GB RAM per instance a third one was not possible without heavy increase in sec/frames. So my next plan for RAM is going by 2x32GB instead of 4x16GB to go higher in MHz but I already have B-Dies so I guess switching to 2x32Gb will be somewhat difficult and/or expensive. Oh yes and with 4 sticks and 1.5V active cooling was a must for me.

I am somewhat limited on funds right now so if I would go for a cpu change I dont believe 3d cache would make such a huge difference. Instead I think 16 cores would do to be able to fire up a 4th instance since this load is around 85-90% already with 3 instances.