Topaz Video AI Beta 3.1.1.1.b

gregory.maddra · January 23, 2023, 11:03pm

Hello Everyone!

We have another beta for y’all. Please let us know if processing gets stuck and send us the logs.

Download:

Windows
Mac (DMG) | Mac (PKG)

Changelog:

Fixes issue with Theia not getting the right params
Fixes Proteus Auto getting stuck when preview is less than 0.5s
Added export options to avoid creation of temp files (details here)
Fixes Stabilization showing artifacts
Rearranged Preferences - removed Advanced tab, moved default preset to Interface tab
Fixes Apollo Soft/Sharpen models showing severe artifacts on Mac

Please test the following items and share your feedback:

Please test Stabilization and let us know if it still shows artifacts where entire frame has some colored patterns
Please check if the export options work correctly for you

Thanks for testing

Please upload problem videos and logs here: Submit files

david.123 · January 23, 2023, 11:15pm

Thanks Greg, first like again! I’m on a roll!

Martyprod · January 23, 2023, 11:20pm

good from you, you deserve some candies !!

MikeF · January 24, 2023, 2:18am

Nice job. Just finished a 640x480 video that I scaled 225 Percent, 1440x1080. I used Proteus Auto with a little grain. I got 25 Fps and this while running a plex server and playing a plex video in the background. This was with a Ryzen 9 3900x, 64Gb ram and a RTX 3060Ti. That’s speed I can live with.

20rushtonj · January 24, 2023, 2:20am

I just thought I would report that I am continuing to get export errors when using NVENC AV1 with Apollo Frame Interpolation on vertical videos.

[av1_nvenc @ 000002244C5AD440] Failed locking bitstream buffer: invalid param (8):
Error submitting video frame to the encoder

Steps to reproduce:

Open a vertical video in TVAI (I personally used a video with dimensions 1080x1920 and 23.97fps)
Change the frame rate in the GUI (I personally picked 60fps).
In the frame rate interpolation settings select one of the Apollo models (Apollo, Apollo Soft, Apollo Sharp)
Set the video encoder to NVENC AV1
Start processing the video in the GUI you you will get an “unknown error” shortly after the processing starts. If you run this in the command line instead of the GUI then you will get the NVENC error listed above.

I don’t experience issues if I use other codecs (NVENC H264, NEVENC H265, ProRes, etc).
I also don’t experience issues if I use other frame rate interpolation models like Chronos.
I also don’t experience issues with horizontal videos with similar specifications (1920x1080 at 23.97fps)

GPU: RTX 4090 (Driver 528.02 Studio Driver)
CPU: Ryzen 9 5950X
RAM: 64GB
Windows 11

TomaszW · January 24, 2023, 6:21am

there is still no manager models, I’m sad.

ole.solev · January 24, 2023, 6:58am

Did you fix noise level ignorance?
Did you fix Relative to Auto ineffectiveness?

suraj · January 24, 2023, 7:50am

Will look into this, expect it fixed by 3.1.2

suraj · January 24, 2023, 7:51am

Don’t expect it anytime soon.

suraj · January 24, 2023, 7:52am

Can you elaborate on the problems with more details.

TPX · January 24, 2023, 8:10am

The speed with AMD GPUs is still at the level of 2.6.4. and below.

I assume that with AMD GPUs the models are executed serially and not in parallel.

TPX · January 24, 2023, 8:18am

Thats 24 FPS with 4 parallel previews.

8.2 fps with a single preview.

With a single Preview 3639 MB are loaded in the Vram, with 4 Previews 14421 MB are loaded into the Vram.

suraj · January 24, 2023, 9:08am

The optimizations in 3.1 are for all platforms and ensure parallel processing. Since majority of the users are on Nvidia and also Nvidia has better framework support the focus is on Nvidia GPUs. We will be looking into AMD specific optimizations. I might just put out a few AMD GPU specific alphas.

JakSpoon · January 24, 2023, 9:22am

@suraj

Thank you for this update!

Ps. I keep noticing a really slow download of the updates, when done directly from within the program: when notified of the available update and clicking on Download, the speed is really ridiculous (considering I have FTTH fiber and Photo AI downloads in the same way MUCH faster).

I am notifying you of this problem and hope it will be fixed with the next updates.

TPX · January 24, 2023, 9:23am

As described in the other thread, I can’t imagine a previously slower GPU (RTX 5000) overtaking a faster GPU (W6800) even though it wasn’t capable of doing so before, just because of parallel execution.

If it had been faster, it would have been seen before.

And the parallel execution of four previews actually shows that the GPU (W6800) can do more.

I know that Nvidia has a better support and much better documentation, I can also only reflect what I see and how it behaves and that makes no sense.

Thats the situation with V3.0.3.0 performed by Puget systems.
RX6900 XT is close to 4090 and in front of 3090.
With parallel execution now the 4090 is two to four times as fast.
And this makes no sense.
People have already written in the forum that it runs poorly on AMD with 3.1.0.

TomaszW · January 24, 2023, 9:42am

The Proteus is differents quality with the last version, why?

adrjet22 · January 24, 2023, 10:35am

Can you elaborate? I did not find any significant difference. The model has not changed.

20rushtonj · January 24, 2023, 10:44am

Some GPUs behave better than others as you increase the parallelization of the work. And this has become quite noticeable with recent Nvidia GPUs.

Recent high end consumer Nvidia GPUs typically have much higher raw compute performance than their competitor consumer AMD GPUs. For example, the RTX 3090 with 35.58 TFLOPS of FP32 compute performance compared to the RX 6950XT with 23.65 TFLOPS of FP32 compute performance.

Yet in a large number of applications that make heavy use of FP32, this does not lead to a noticeable performance boost for Nvidia. And this is because a lot of applications have a hard time making their work parallel enough to take full advantage of the highly parallel GPU design Nvidia has started using on their consumer GPUs in recent years.

It’s POSSIBLE that the changes made in TVAI to increase the parallelization of tasks ends up favoring Nvidia more than AMD GPUs due to differences in the two manufacturers GPU designs. Or the way parallelization was implemented favors Nvidia. However, I’m doubtful this is the main cause for the performance you’re observing.

The use of Tensor RT on Nvidia compared to AMD could be having an impact?
There could be driver or compiler bugs for AMD that’s causing a performance decrease (Another application I use had a 30% performance decrease on AMD a few months ago due to a compiler bug)?
Maybe there’s a lack of proper optimization for AMD hardware?

This is all just speculation. And you’re better off waiting and seeing what the “AMD optimizations” that @suraj has suggested may be being worked on, brings for your hardware.

You also have to keep in mind that the increase in parallelization of TVAI also probably increases the memory bandwidth requirements of the applications. Nvidia and AMD once again differ there when comparing “typical competitors”. For example, the RTX 3090 has a memory bandwidth of 936.2 GB/s
while the RX 6950XT has a memory bandwidth of 576.0 GB/s. And they’re laid out differently, for example, the RTX 3090 having a wider memory bus. This could also be having an impact on the performance scaling of different GPUs with the update. But once again, this is all speculation and it’s possible very few of these factors have an impact on the scaling of TVAI across different hardware.

I really just wanted to let you know that as code changes are made to improve performance, the scaling of that performance improvement may be different across different hardware, more so across different vendors due to differences in chip design, drivers, memory layout, memory speed, etc.

TPX · January 24, 2023, 10:58am

I did reinstall 3.0.12 and the performance is 1 frame slower to 3.1.1.1b.

Therefore, I assume that I am correct with my assumption.

I did compare a 2018 Turing GPU vs a RDNA2 2020 GPU.

Quadro RTX 5000 = 22 fps in 3.1.0.

FP16 (half) performance
22.30 TFLOPS (2:1)

FP32 (float) performance
11.15 TFLOPS

Bandwidth
448.0 GB/s

Radeon Pro W6800 = 8 - 11 fps in 3.1.0(1b) → 6 - 8 fps in 3.0.12. → 6.6 fps in 2.6.4

FP16 (half) performance
35.64 TFLOPS (2:1)

FP32 (float) performance
17.82 TFLOPS

Bandwidth
512.0 GB/s

You are of course right with your data, no question, but I have tested both GPUs in the same system.
Except for version 3.0.12, which I tested today on the W6800.