VEAI Performance

Wanted to inquire on how it’s going with getting usage out of the additional tensor cores found in RTX 30 GPU’s.
And while likely not ideal for AMD users, if there were models being tested to take as much performance as they can out of tensorflow?

With current tests, even in their unoptimized states an RTX card is 4-5x faster than a GTX card. But that’s only a little faster than if the cards had continued to use CUDA with a 3080 having 3-4x more CUDA cores than a 1080.

So I’m hoping there’s more performance to be had with tensor even with the RTX 20 series cards.

1 Like

First, GTX does not have faster CUDA cores. Or just don’t have enough CUDA at all.
Second, I think for DirectML, tensor cores are faster than CUDA. Also tensor cores are always faster than CUDA cores.

I never said GTX had faster CUDA cores.

DirectML is debatable. It brings support for AMD cards which perform remarkably well. But originally it HURT the performance on GTX cards and only in recent versions has it been normalized to give the same performance GTX originally had on CUDA.

Tensor cores are obviously faster since they’re made for such a workload. But at present VEAI’s tensor performance is lacking. They’ve integrated the technology but haven’t optimized it. And my original post was asking about improvements to the technology for specifically RTX 30 series cards. If you read the original post of this entire thread then you should know that an RTX 30 series card performs exactly the same as an RTX 20 series card due to not being able to utilize the additional tensor cores provided in Ampere.

2 Likes

Yes. I saw it. But as Topaz developers explained that it will take months for the RTX 3000 series. I guess it’s hard for them to get one on hands maybe?
Besides that, they sacrified the performance of GTX to boost all other cards as well. But because DirectML is not yet optimized by Microsoft, they’re still waiting for them. Then you will see performance improved. They all said that in the previous posts.

Is there a way to check whether VEAI is actually using my GPU or not? in Task manager on windows 10 I can see CPU very high and GPU at zero when running VEAI. I have GPU selected in VEAI to use,
Nvidia GTX 1070. Windows 10, 32GB Ram

1 Like

never mind. I found GPU-Z monitoring tool to check the behaviour of my GPU. Seems ok, Task manager / performance monitor doesnt seems to see the right values

Press at Performance then you will see.

Any 1 with a RX 5XXX and or RX 6XXX here that could test if that GPU is stuttering if you run more than one instance of VEAI + Gigapixel + a Stream on YT.

I realy like to know how good Navi is at handling more than one compute work.

I did test a Radeon Pro VII in November with unmature drivers and it wasn’t able to handle Denoise + Gigapixel + photoshop + Capture One and a stream at once, it did stutter a lot.

The RTX 5000 besides did.

I’m considering this. Be good to see your results.

I render out a large file output then scale it down using handbrake in H265 then add in the original audio or convert it to AAC

I’ve just had a play around myself and it seems my GPU gets about 39% usage - its mostly on the CPU, so at this point i’m not sure the software is worth the cost if its just so inconsistent and they shift blame everywhere else…

Anyway, I’m on an RTX 3090 too, liquid cooled, along with an i9-9900K @5GHz, yet strangely i get 0.02-0.07 frames/sec… A 25:51 video taking 54 minutes to process at a resolution of just 1088x704 with a HUGE file size after, plus the whole 96kbps audio problem… It can’t be that hard to add a drop-down box so we can select 128 or more, can it? :stuck_out_tongue: The whole reason i wanted this software to be good was to not need 4 or 5 different applications and loads of messing around just to get everything to work! Almost seems pointless if i have to still use ffmpeg, handbrake and so forth… Far from being a front-end to those, it’d at least be nice for it to have some control and do some of this on the users behalf - that would speed up render times significantly not having to spend so long getting ready to render! Haha…

I must admit though the result was quite impressive, even with the worse audio! Can be seen here: Secret Life Of Machines 106 The Television 2 00x 1088x704 alq 12 - YouTube and hopefully i don’t get shot for copyright…

2 Likes

Noob question: As long as a card has CUDA or DirectML support, VEAI will work with it, yes? How far back WRT CUDA or DirectML (since older cards will probably not be supported on new CUDA or DirectML releases)?

I’m thinking of A100 or old Tesla cards, since graphics cards are so scarce at the moment.

I read elsewhere here that eGPUs are supported, but performance is not that great (but that could have been an older VEAI release). Linux support would be nice. Any render farms supporting VEAI?

You need direct x support otherwise the software will not start to preview or export.

When i set my quadros to TCC (Tesla Compute Cluster Mode) TL software does hang.

eGPU works great with VEAI, with no performance penalty, as far as I have experienced.

Hi, I’ve just upgrade my two computers to use RTX cards after agonisingly long waits on my NVIDIA P2000 when restoring video.
One is an AMD 6 core with an RTX 3080 Ti card. The other is an i7 6 core with an RTX 3060Ti card. The improvement is astonishing. I also fitted 2 x 1TB NVME cards - one to each computer. This helps the transfer to disk time enormously. So I render out to SSD using deinterlace and upscale from 720x576 to 3072x2304 at 50 FPS and am getting 0.33 sec per frame render time on the Intel computer with the 3060 Ti card. I have every respect for the work that you are doing on the software, and will try the idea of two versions running simultaneously on my RTX 3080 Ti computer, only as the RTX 3060 Ti seems to be getting better performance. I realise the note you posted was in January but notice only sporadic 100% use of the GPU core capability on the RTX 3080 Ti compared to the 100 % near continuous use on the RTX 3060 Ti.

1 Like

The transfer to a ready made file is time consuming on the CPU as well as the GPU I think. You must have a fast disk to save to. I render out to 8 bit tif sequences then it is the fastest. I am getting 0.33 seconds per frame (3 frames per second) with my RTX 3060 Ti (uncooled) on my i7 6 core - and that’s de-interlacing and 400% upscaling. Looks marvellous and goes through nicely. I’m rendering out to a 1TB NVME which is the fastest to use for storing. Envious of the speeds that you’re getting on your RTX 3090 and i9!!!.

1 Like

Not sure if I’m doing something wrong, but Youtube doesn’t appear to support a 704p resolution so the highest option available is 480p… so YT is downscaling your upscale to basically the same size as your source…

But your piece does raise an interesting suggestion - a model that could take the ‘old video style’ and clean/modernise, and potentially vice-versa, would be a very interesting item.

I’m using Artemis High Denoise on a 1920x1080 @ 29.97 fps with 18,291 frames using a GTX 960 GPU and ETA is 4 hrs: 14 mins, 1.75 sec/frame which is 52.56x over source frame rate This speed is way too slow for a 10 min, 10 sec video. A 90 min video would take forever. Will a GTX 3060 reduce the processing time by 3x to get at least 0.583 sec/frame?

https://community.topazlabs.com/t/what-does-veai-performance-look-like-between-amd-nvidia/18581/68?u=tpx

Thanks a lot for those details! I am currently using an RTX Quadro A4000 (Ampere card, think a 3070 but with 16GB of RAM, triple the amount of tensor cores of 3070 but 1/3rd of the amount of ray tracing cores), currently running VEAI uses up between 4-8GB of VRAM depending on the model and the scaling size, it never went above 50% (I haven’t tried upscaling to 8K though), and Graphics_1 utilization is always somewhere between 30%-45%, also depending on the model and the upscaling settings.

I have a few questions:

  1. Do you still see support for cards with more than 12GB on the horizon?
  2. Are the extra tensor cores expected to add anything to performance vs. a regular 3070 for example?
  3. Since you were supporting CUDA before, can’t you add an options in the settings to switch between CUDA and DirectML? Or does DirectML have the same performance anyway?
  4. How is your progress with RTX 3000 series optimizations in general, when you say “superior performance”, how much of an increase are you expecting in general?
  5. Are the optimizations usually part of the model itself or the inferencing software? Do you think one day we could be training our own models and then using them within VEAI?

Thanks a lot for the great work!