I think it’s time to say it as it is. The improvements with artificial intelligence have grown immensely throughout the last five years, and they are getting faster. The thing with Topaz is that, unless you have literal ideal footage, you are going to get artifacts. The biggest challenge for AI nowadays, in my opinion, is to deliver good results regardless of the input. Generative fill from Photoshop, and Stable Diffusion, are great examples for that. You either end up with an impressive result (because the algorithm was trained enough for that) or a total humanoide or geometric mess. Topaz does a fair job considering that dealing with video is a whole different and much more complicated field.
For the purpose of upscaling a video, first of all, I believe that the AI algorithm has to learn what we humans consider as a beautiful viewing experience, that’s what we are searching for in the first place. Artifacts are just the result that the algorithm is able to deliver from the scope of data that it has learned from. But we cannot train AI to learn every single feature of the visible world, that’s why I’m making this post to suggest something that I believe will be much more simple and effective for AI upscaling.
- Let’s suppose that I download a 144p video from YouTube. Yes, it’s an extreme exemple, but keep reading. I think that Topaz should analyze the video characteristics and know what kind of improvement can be done to that specific source, considering the final resolution that I’m looking for. In this case, I beg the developers to develop a light upscaling model, similar to what artemis anti-alias does, but combined with the denoising power of proteus. The AI could identify the edges from the subject and the rest of the scene, then apply intelligent sharpening and a denoise filter that is aware of what is the actual noise, not texture. That alone would be great.
- For high-res inputs, the AI should analyze if there’s enough detail on the scene to restore something that was present on its training batabase: body parts, objects, textures, geometric figures etc., and of course LETTERS. The fonts are already out there, we just need training.
- I highly suggest that the developers get inspired from avisynth/vapoursynth list of filters to solve the variety of problems that may come within a video:
External filters - Avisynth wiki
Internal filters - Avisynth wiki