Topaz Video AI v4.0.4 System Information OS: Mac v12.0701 CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz 32 GB GPU: AMD Radeon R9 M395X 4 GB Processing Settings device: 0 vram: 0.75 instances: 1 Input Resolution: 1920x1080 Benchmark Results Artemis 1X: 00.94 fps 2X: 00.61 fps 4X: 00.22 fps Iris 1X: 00.71 fps 2X: 00.35 fps 4X: 00.12 fps Proteus 1X: 00.78 fps 2X: 00.53 fps 4X: 00.22 fps Gaia 1X: 00.25 fps 2X: 00.18 fps 4X: 00.14 fps Nyx 1X: 00.36 fps 2X: 00.32 fps 4X Slowmo Apollo: 00.80 fps APFast: 03.46 fps Chronos: 00.24 fps CHFast: 00.45 fps
The numbers are ok if you consider that the M3 Max has 20 Teraflops fp32.
Comparable to RX 6800 XT bc it has 20 TF too.
RTX 4090 has 82 Teraflops.
You oversaw the Iris 2x numbers (which I guess is the most widely used model and upscale factor)? And Iris performance in general which is roughly comparable to Proteus on NVidia, AMD and even integrated Intel GPUs, just not on Apple Silicon.
Those numbers definitely aren’t OK for Iris performance - and we also do know from the past that Iris could generally be nearly 2x faster, for the 2x upscale model more like 4 times faster.
Apple seems very closed when it comes to their hardware. To get maximum performance, you would probably need to create a new version of the application that uses all their building tools.
There are software developers on my team that got M1 Macs when they came out. There are still development tools we use that just don’t work the same on M1. They’ve had several years to fix that now. Either it must be really hard, or not possible.
But this doesn’t really apply to TVAI.
We already had faster Iris speeds but there was a conflict with the early Sonoma versions and TVAI 3.4 upwards leading to a garbage output.
The fix Topaz applied for that came with a drastic performance loss especially for the (most used) x2 upscale model.
Just that now with Sonoma 14.1 (and 14.2) this old fast models in fact work flawlessly again at least here on two different configs (M1 Pro and M2 Ultra) without visual flaws but better performance.
Plus, even accepting the low performance TVAI doesn’t really scale well with multiple GPU cores. My M2 ultra with 60 GPU cores is only up to about 2.5 times than the M1 Pro with 16 cores.
(P.S.: The not so well scaling of TVAI with multiple cores / higher GPU performance partly can also be seen on NVidia).
That’s exactly what I’m trying to say. TVAI is built using more open accessible building tools. Meant for multiple platforms, but not blessed with all of Apples latest and greatest. If my my work is any indication, such building tools may never be graced with such endowments.
How worth it would it be for Topaz to start fresh on the Apple-designated development ecosystem. Anything made on that won’t translate or carry over to the other operating systems they are trying to cover.
Again, the solution is already there. They had done optimization with a big speed gain in the past and taken that back due to errors. Now the errors are gone (so in fact it seems Apple have fixed some issues here in the meantime). All they’d need to do is revert to those old fast model files that are already there. So how much work is that?
And to make this a little more concrete:
This is TVAI 4.0.4 when installed from DMG, only benchmark.json edited to use Iris V1 instead of V2:
And then the exact same system after I “patched” it with the old Iris V1 model files from Aug. 2023:
And I’d say that this is a dramatic difference with the patched version being nearly 2.7 times faster on the exact same system in my most used scenario (Iris 2x upscale).
I’m not talking about Iris. I’m talking about the scaling you mentioned:
But that scaling issue in TVAI can also be seen with the NVidia GPUs if you compare a e.g. 4060Ti to a 4090.
And also is not there on the Mac for most other software, so…
But then this is getting quite OT now.
Apple, in order to improve performances, did the choice to leave Intel to develop their processor based on ARM architecture, that is completely different from Intel architecture. Thus, it is obvious that to get optimized software, developers need to go to new development tools. However, even if Windows is still based on Intel architecture, Microsoft develops also a windows based on ARM, and they will probably in maybe several year switch completely. That means that it could be a good idea to invest in new development tools for these new architectures…
It’s not only the ARM versus X86 differences. The M series chips by Apple have sections for AI computation, graphics and such. It appears that those only get used correctly, if it’s done through whatever development path Apple has created.
It makes sense that new technology might not be compatible with the old ways of doing things. That’s what Nvidia did too. So it’s not unseen nor unheard of.
The gap I don’t understand is why, for the development tools we use at work—and therefore probably to other programs like TVAI—it has been years and we’re not seeing an adoption that gives access to the full benefits of Apple chips. It’s more like they created a limited compatibility layer and called it good enough.
A little disclaimer here. It should read “improve battery performance”. ARM is in no way computationally ‘more preferment’ than X86. It just uses less power wherever it can, whenever it can.
Topaz Video AI v4.0.4 System Information OS: Windows v11.22 CPU: 13th Gen Intel(R) Core(TM) i5-13500 63.67 GB GPU: NVIDIA GeForce RTX 3060 7.8613 GB GPU: Intel(R) UHD Graphics 770 0.125 GB Processing Settings device: 0 vram: 1 instances: 1 Input Resolution: 1920x1080 Benchmark Results Artemis 1X: 08.20 fps 2X: 06.04 fps 4X: 01.93 fps Iris 1X: 08.16 fps 2X: 04.80 fps 4X: 01.55 fps Proteus 1X: 07.95 fps 2X: 05.25 fps 4X: 01.88 fps Gaia 1X: 02.62 fps 2X: 01.86 fps 4X: 01.29 fps Nyx 1X: 03.26 fps 2X: 02.73 fps 4X Slowmo Apollo: 12.40 fps APFast: 35.34 fps Chronos: 06.49 fps CHFast: 10.86 fps
Just curios as to why the 4090 performs around 4 times better than my 3060 at 2X but only around 2 times better at 4X. Could this be due to a memory limitation?
Topaz Video AI v4.0.4 System Information OS: Mac v14.0101 CPU: Apple M1 Max 32 GB GPU: Apple M1 Max 21.333 GB Processing Settings device: 0 vram: 1 instances: 1 Input Resolution: 1920x1080 Benchmark Results Artemis 1X: 09.14 fps 2X: 05.53 fps 4X: 01.94 fps Iris 1X: 09.39 fps 2X: 01.72 fps 4X: 01.24 fps Proteus 1X: 08.91 fps 2X: 05.10 fps 4X: 01.85 fps Gaia 1X: 02.90 fps 2X: 01.90 fps 4X: 01.47 fps Nyx 1X: 03.75 fps 2X: 03.04 fps 4X Slowmo Apollo: 10.54 fps APFast: 33.18 fps Chronos: 03.08 fps CHFast: 05.24 fps
Iris 2x… Ouch!
It’s related to L2 cache size.
Topaz Video AI v4.0.4 System Information OS: Windows v10.22 CPU: AMD Ryzen Threadripper 3970X 32-Core Processor 127.87 GB GPU: NVIDIA GeForce RTX 3090 23.77 GB Processing Settings device: 0 vram: 1 instances: 1 Input Resolution: 1920x1080 Benchmark Results Artemis 1X: 25.94 fps 2X: 12.72 fps 4X: 03.49 fps Iris 1X: 23.49 fps 2X: 13.78 fps 4X: 03.93 fps Proteus 1X: 25.25 fps 2X: 11.55 fps 4X: 03.46 fps Gaia 1X: 08.58 fps 2X: 05.65 fps 4X: 03.25 fps Nyx 1X: 10.14 fps 2X: 08.37 fps 4X Slowmo Apollo: 34.82 fps APFast: 56.30 fps Chronos: 19.28 fps CHFast: 27.03 fps
Topaz Video AI v4.0.4 System Information OS: Mac v14.0101 CPU: Apple M3 Max 36 GB GPU: Apple M3 Max 27 GB Processing Settings device: 0 vram: 1 instances: 1 Input Resolution: 1920x1080 Benchmark Results Artemis 1X: 11.42 fps 2X: 07.01 fps 4X: 02.40 fps Iris 1X: 10.33 fps 2X: 01.92 fps 4X: 01.66 fps Proteus 1X: 11.10 fps 2X: 06.31 fps 4X: 02.19 fps Gaia 1X: 03.47 fps 2X: 02.47 fps 4X: 01.65 fps Nyx 1X: 03.58 fps 2X: 03.26 fps 4X Slowmo Apollo: 12.29 fps APFast: 50.21 fps Chronos: 03.93 fps CHFast: 06.56 fps
Note to Nvidia:
We need at least 10 GB L2 with Broadwell.
Or even 1GB!