Proteus Natural on Mac is grossly inefficient

Firstly, why doesn’t Proteus Natural use the GPU cores? Especially when you consider the new M5 has neural accelerators in each GPU core. However, let’s put that aside and give Topaz the benefit of doubt that they know best.

So why is it then, by using the CLI, I can get at least twice the speed? On my M3 Ultra, a 768x576 source gives around 7.4 FPS in the UI. But using my segment stacking command in the CLI, I get around 15 FPS.

But there’s more. Given the speed for a 4K source is around 1.3 FPS, the 768x576 source could achieve around 24 FPS if fully optimised.

Before Topaz comes back and say we can process up to 4 videos at once to increase the efficiency - YES, we know all that and I’ve already tried. It is reassuring that I can get the equivalent of around 16.2 FPS in total with 4 running at once. But that’s not the point. Why can’t we expect the maximum speed - or close to it - for a single video? If I can at least double the speed in a single command, is it too much to expect someone in your Dev Team to be able to do something similar? But please, don’t let me interrupt them from getting Starlight working properly.

And Topaz still think it’s a good idea to stop us from using the CLI?

Andy

2 Likes

Proteus natural is not using GPU cores on Mac due to a quality issue. We are working with Apple to address the problem. Once we have confirmation that it is fixed, we will enable the use of the GPU for Proteus Natural.

I will get the GUI team to look into the significant performance loss while using the GUI vs CLI. While there might be some performance loss, 50% is just ridiculous.

As far as single instance/process performance is concerned. There are inter frame dependencies, model input sizing, color correction, blending etc, while the bulk of the load is ANE/GPU bound. Many operations still need to happen on the CPU. In the 4K vs 768x576 example you cite it has more to do with block sizing. A 720p source vs 4K is a better example but I’m sure there might still be perf differences.

Starlight on Mac should be fixed in Mac OS 26.3, if you have access to the betas the latest one should already fix it.

I agree with you about the CLI, I use it all the time myself. FFmpeg with our plugin will always be available at GitHub - TopazLabs/FFmpeg: Mirror of https://git.ffmpeg.org/ffmpeg.git and you should be able to use it with a valid license.

3 Likes

Many thanks for your reply.

Below is an example of the command I’ve created. In a nutshell here’s what it does in addition to a standard de-interlace and Proteus Natural upscale:

  • Prior to upscaling, create a (horizontal) stack of 5 streams. To picture the arrangement: A B C D E appears first followed by F G H I J etc. where each letter represents a 50-frame segment extracted sequentially from the original stream.

  • That stack stream is then upscaled, which is evidently much more efficient for the ANE having 5 x the original pixels (equal to 1920x1080 if the original was 720x576). That is the key to the speed increase.

  • The upscaled stack is then cropped and interleaved back to the original sequence ABCDEFGHIJ etc.

export TVAI_MODEL_DATA_DIR='/Applications/Topaz Video.app/Contents/Resources/models'
export TVAI_MODEL_DIR='/Applications/Topaz Video.app/Contents/Resources/models'
cd '/Applications/Topaz Video.app/Contents/MacOS'

The command:

fc=50 ; fr=25 ; ./ffmpeg -y -i /Users/Andy/Desktop/Topaz/Source.mkv -r $fr -filter_complex " \
bwdif=mode=0:parity=-1:deint=0,split=5[1][2][3][4][5]; \
[1]select='lt(mod(n,$fc*5),$fc)', \
setpts=N/($fr*TB)[S1]; \
[2]select='between(mod(n,$fc*5),$fc,$fc*2-1)', \
setpts=N/($fr*TB)[S2]; \
[3]select='between(mod(n,$fc*5),$fc*2,$fc*3-1)', \
setpts=N/($fr*TB)[S3]; \
[4]select='between(mod(n,$fc*5),$fc*3,$fc*4-1)', \
setpts=N/($fr*TB)[S4]; \
[5]select='between(mod(n,$fc*5),$fc*4,$fc*5-1)', \
setpts=N/($fr*TB)[S5]; \
[S1][S2][S3][S4][S5]hstack=inputs=5, \
tvai_up=model=pnat-1:scale=2:blend=0.5:device=0:vram=1:instances=1, \
split=5[CS1][CS2][CS3][CS4][CS5]; \
[CS1]crop=iw/5:ih:0:0, \
setpts=(N+floor(N/$fc)*4*$fc)/($fr*TB)[C1]; \
[CS2]crop=iw/5:ih:iw/5:0, \
setpts=($fc+N+floor(N/$fc)*4*$fc)/($fr*TB)[C2]; \
[CS3]crop=iw/5:ih:2*iw/5:0, \
setpts=($fc*2+N+floor(N/$fc)*4*$fc)/($fr*TB)[C3]; \
[CS4]crop=iw/5:ih:3*iw/5:0, \
setpts=($fc*3+N+floor(N/$fc)*4*$fc)/($fr*TB)[C4]; \
[CS5]crop=iw/5:ih:4*iw/5:0, \
setpts=($fc*4+N+floor(N/$fc)*4*$fc)/($fr*TB)[C5]; \
[C1][C2][C3][C4][C5]interleave=nb_inputs=5" \
-c:v prores_videotoolbox -profile:v standard -pix_fmt p210le -allow_sw 1 \
-max_interleave_delta 0 -movflags frag_keyframe+empty_moov+delay_moov+use_metadata_tags+write_colr ~/desktop/Topaz/upscaled.mkv

The actual speed increase will depend on OS version, M chip type and original resolution. I got an 85% speed increase using the stacking method from a 720x576 PAL DVD source on my M3 Ultra running MacOS 26.3 Release Candidate.

Thanks.

Andy