Topaz Video AI v4.0.4
System Information
OS: Mac v12.0701
CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz 32 GB
GPU: AMD Radeon R9 M395X 4 GB
Processing Settings
device: 0 vram: 0.75 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 00.94 fps 2X: 00.61 fps 4X: 00.22 fps
Iris 1X: 00.71 fps 2X: 00.35 fps 4X: 00.12 fps
Proteus 1X: 00.78 fps 2X: 00.53 fps 4X: 00.22 fps
Gaia 1X: 00.25 fps 2X: 00.18 fps 4X: 00.14 fps
Nyx 1X: 00.36 fps 2X: 00.32 fps
4X Slowmo Apollo: 00.80 fps APFast: 03.46 fps Chronos: 00.24 fps CHFast: 00.45 fps
The numbers are ok if you consider that the M3 Max has 20 Teraflops fp32.
Comparable to RX 6800 XT bc it has 20 TF too.
RTX 4090 has 82 Teraflops.
You oversaw the Iris 2x numbers (which I guess is the most widely used model and upscale factor)? And Iris performance in general which is roughly comparable to Proteus on NVidia, AMD and even integrated Intel GPUs, just not on Apple Silicon.
Those numbers definitely arenât OK for Iris performance - and we also do know from the past that Iris could generally be nearly 2x faster, for the 2x upscale model more like 4 times faster.
Apple seems very closed when it comes to their hardware. To get maximum performance, you would probably need to create a new version of the application that uses all their building tools.
There are software developers on my team that got M1 Macs when they came out. There are still development tools we use that just donât work the same on M1. Theyâve had several years to fix that now. Either it must be really hard, or not possible.
But this doesnât really apply to TVAI.
We already had faster Iris speeds but there was a conflict with the early Sonoma versions and TVAI 3.4 upwards leading to a garbage output.
The fix Topaz applied for that came with a drastic performance loss especially for the (most used) x2 upscale model.
Just that now with Sonoma 14.1 (and 14.2) this old fast models in fact work flawlessly again at least here on two different configs (M1 Pro and M2 Ultra) without visual flaws but better performance.
Plus, even accepting the low performance TVAI doesnât really scale well with multiple GPU cores. My M2 ultra with 60 GPU cores is only up to about 2.5 times than the M1 Pro with 16 cores.
(P.S.: The not so well scaling of TVAI with multiple cores / higher GPU performance partly can also be seen on NVidia).
Thatâs exactly what Iâm trying to say. TVAI is built using more open accessible building tools. Meant for multiple platforms, but not blessed with all of Apples latest and greatest. If my my work is any indication, such building tools may never be graced with such endowments.
How worth it would it be for Topaz to start fresh on the Apple-designated development ecosystem. Anything made on that wonât translate or carry over to the other operating systems they are trying to cover.
Again, the solution is already there. They had done optimization with a big speed gain in the past and taken that back due to errors. Now the errors are gone (so in fact it seems Apple have fixed some issues here in the meantime). All theyâd need to do is revert to those old fast model files that are already there. So how much work is that?
And to make this a little more concrete:
This is TVAI 4.0.4 when installed from DMG, only benchmark.json edited to use Iris V1 instead of V2:
And then the exact same system after I âpatchedâ it with the old Iris V1 model files from Aug. 2023:
And Iâd say that this is a dramatic difference with the patched version being nearly 2.7 times faster on the exact same system in my most used scenario (Iris 2x upscale).
Iâm not talking about Iris. Iâm talking about the scaling you mentioned:
But that scaling issue in TVAI can also be seen with the NVidia GPUs if you compare a e.g. 4060Ti to a 4090.
And also is not there on the Mac for most other software, soâŠ
But then this is getting quite OT now.
Apple, in order to improve performances, did the choice to leave Intel to develop their processor based on ARM architecture, that is completely different from Intel architecture. Thus, it is obvious that to get optimized software, developers need to go to new development tools. However, even if Windows is still based on Intel architecture, Microsoft develops also a windows based on ARM, and they will probably in maybe several year switch completely. That means that it could be a good idea to invest in new development tools for these new architecturesâŠ
Itâs not only the ARM versus X86 differences. The M series chips by Apple have sections for AI computation, graphics and such. It appears that those only get used correctly, if itâs done through whatever development path Apple has created.
It makes sense that new technology might not be compatible with the old ways of doing things. Thatâs what Nvidia did too. So itâs not unseen nor unheard of.
The gap I donât understand is why, for the development tools we use at workâand therefore probably to other programs like TVAIâit has been years and weâre not seeing an adoption that gives access to the full benefits of Apple chips. Itâs more like they created a limited compatibility layer and called it good enough.
A little disclaimer here. It should read âimprove battery performanceâ. ARM is in no way computationally âmore prefermentâ than X86. It just uses less power wherever it can, whenever it can.
Topaz Video AI v4.0.4
System Information
OS: Windows v11.22
CPU: 13th Gen Intel(R) Core(TM) i5-13500 63.67 GB
GPU: NVIDIA GeForce RTX 3060 7.8613 GB
GPU: Intel(R) UHD Graphics 770 0.125 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 08.20 fps 2X: 06.04 fps 4X: 01.93 fps
Iris 1X: 08.16 fps 2X: 04.80 fps 4X: 01.55 fps
Proteus 1X: 07.95 fps 2X: 05.25 fps 4X: 01.88 fps
Gaia 1X: 02.62 fps 2X: 01.86 fps 4X: 01.29 fps
Nyx 1X: 03.26 fps 2X: 02.73 fps
4X Slowmo Apollo: 12.40 fps APFast: 35.34 fps Chronos: 06.49 fps CHFast: 10.86 fps
Just curios as to why the 4090 performs around 4 times better than my 3060 at 2X but only around 2 times better at 4X. Could this be due to a memory limitation?
Topaz Video AI v4.0.4
System Information
OS: Mac v14.0101
CPU: Apple M1 Max 32 GB
GPU: Apple M1 Max 21.333 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 09.14 fps 2X: 05.53 fps 4X: 01.94 fps
Iris 1X: 09.39 fps 2X: 01.72 fps 4X: 01.24 fps
Proteus 1X: 08.91 fps 2X: 05.10 fps 4X: 01.85 fps
Gaia 1X: 02.90 fps 2X: 01.90 fps 4X: 01.47 fps
Nyx 1X: 03.75 fps 2X: 03.04 fps
4X Slowmo Apollo: 10.54 fps APFast: 33.18 fps Chronos: 03.08 fps CHFast: 05.24 fps
Iris 2x⊠Ouch!
Itâs related to L2 cache size.
Topaz Video AI v4.0.4
System Information
OS: Windows v10.22
CPU: AMD Ryzen Threadripper 3970X 32-Core Processor 127.87 GB
GPU: NVIDIA GeForce RTX 3090 23.77 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 25.94 fps 2X: 12.72 fps 4X: 03.49 fps
Iris 1X: 23.49 fps 2X: 13.78 fps 4X: 03.93 fps
Proteus 1X: 25.25 fps 2X: 11.55 fps 4X: 03.46 fps
Gaia 1X: 08.58 fps 2X: 05.65 fps 4X: 03.25 fps
Nyx 1X: 10.14 fps 2X: 08.37 fps
4X Slowmo Apollo: 34.82 fps APFast: 56.30 fps Chronos: 19.28 fps CHFast: 27.03 fps
Topaz Video AI v4.0.4
System Information
OS: Mac v14.0101
CPU: Apple M3 Max 36 GB
GPU: Apple M3 Max 27 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 11.42 fps 2X: 07.01 fps 4X: 02.40 fps
Iris 1X: 10.33 fps 2X: 01.92 fps 4X: 01.66 fps
Proteus 1X: 11.10 fps 2X: 06.31 fps 4X: 02.19 fps
Gaia 1X: 03.47 fps 2X: 02.47 fps 4X: 01.65 fps
Nyx 1X: 03.58 fps 2X: 03.26 fps
4X Slowmo Apollo: 12.29 fps APFast: 50.21 fps Chronos: 03.93 fps CHFast: 06.56 fps
Note to Nvidia:
We need at least 10 GB L2 with Broadwell.
Or even 1GB!