Any luck with more advanced OpenVino or CPU support?

I’ve heard of this software and got a peak from a friend recently, and tested some afford-able upscaling hardwire.
Got to say I’m really like the output with Gaia-CG, It’s unreal… and even better when you use CPU only. If you upscale once and denoise once, almost all old PV records can get a huge improvement.
But, here the thing is, if you try to make it twice, the output will have lots of wave-like structures, you’ll have to add Gaussian Blur in Premiere first to avoid it. And with CPU only mode, it’s much better, guess FP64 acts better.

But I had to found that compared to Nvidia GPU, CPU is not being supported well, it acts in two ways:
1.The CPU and system memory usage keeps low. With an i9 9900KF and 32G memory, I get about 0.25 FPS when denoise 1080P video, the CPU usage is only 65% and only 8G memory is consumed.
If only for Gigapixel AI, that makes sense if you put all cores into one thread. But for VEAI, it’s much better if you simply create one thread for one logical core, and make ten or more frames be upscaled at once.
And also, the program only takes 8G ram. As we all know, CNN needs more ram no matter it’s DDR4 or GDDR6, I tried the program with 1080Tis and 2080Tis, and it will always take more than 10G vram, if the vram is occupied by Chrome, even only 1G, VEAI slows down by about 10% with the same CUDA usage. At first I thought it was by the memory bandwidth, it’s only 60G/s compared to the 484G/s(1080Ti), but I tried a LGA3647 machine with dual 6 channel memory and 240GB/s, and still, it takes only 8 GB ram. So I have to guess the program was set to take only 8 GB ram no matter what CPU and memory configuration is.
If you start 2 or more programs at the same time, the CPU usage will reach 100%, but both two programs slow down and I can only get 0.31 FPS, that’s 125% single program speed. The most interesting thing is, if you just simply set 2 Virtual Machines and give each VM 8 threads and 16GB ram, you can get nearly 0.45 FPS, that’s 180% compare to the default settings. It’s worse when using multiple core CPUs like threadripper and Xeon Gold/Platinum, you’ll have to start lots of VMs to use the whole system.

2.The AVX-512(and VNNI) is not supported well now. In an older version I can get awesome 2 FPS when denoising 1080P video with dual Xeon Gold 6130 engineering sample, and complete 6 minutes video in 6 hours like the theoretical difference between dual Xeon Gold 6130 and 9900KF: Single Gold 6130 is twice of 9900K when processing AVX-512, and the extra memory bandwidth helps like the CAFFE I used. In that case, But in the 1.6.0, the same settings only reached 0.5FPS. I’m not a subscriber so I can not tell you which version it was, but It’s true when I remoted to my friends’s system and upload a 20 minutes family video a month ago, it really works.
Next time I’m trying to do that, it performs worse and the CPU usage stays at 40% to 50% with 8 GB memory consumed.

I know to use CPU as the upscale hardware seems a little luxurious and a little commercial like, but the engineering sample CPUs are really cheap and beat threadrippers with the same price in video encoding, so many freelancers are using them.
The ES Xeon Gold 6130s are only about 40$, and blade server motherboards are only about 130$, with 48GB memory, the whole system is only about 400$, that’s only the price of a second-hand GTX1080Ti and 50% faster when the OpenVino actually works.

And also, I’m about to getting some intel Movidius, Xeon Phi(3647 with full SSE and AVX support), 2rd gen Xeon Scalable Processor with VNNI support, and EPYC engineering samples this time for a cheaper video provement framework like RealSR and DAIN, and I’l be happy to offer you the results.

Right now S299 for 2 machines * 2 FPS * 1Year when upscaling 480P videos is not that great like I thought.
That was only 700 hours… Even not enough to fix my own PVs, which is about 800GB.

Asked my friend to check out again, yes, the program is not using AVX2 actually. There is no avx offset triggered with both R5 3600 and i9 9900K.
So the best choice might be old silly Xeon E5 V2 and C602 motherboards if the license allow me to run on multiple machines… 175 dollars is enough for a 24C48T V2 server.

AVX offset only triggers when AVX512 is used. Regular AVX2 does not trigger the offset.

My apologies, It won’t be triggered on MSDT CPUs but on X99/C612 platform.
Run a test on remote server with hardwire monitoring, so the program is using AVX2, but still no AVX512.

Pleeease keep the numbers coming, I will try to support if you get a question on Intel HW.