Model that passes the upscale "turing" test

patrick.delongchamp · January 25, 2021, 7:12pm

In version 1.6, the Gaia CG model was preferred for many types of upscaling because it enhanced detail in a way that a user who had no seen the source material would not guess that it had been upscaled. e.g. The over smoothing of skin, though not ideal, looked like it could have been part of the source material. No one would have guessed that the video had been upscaled and that this was a result of upscaling.

Artemis HQ does a great job of upscaling content but the end result clearly looks upscaled. Gaia CG effectively “enhanced” video. The output of Artemis HQ doesn’t look enhanced. It looks upscaled.

The ideal model would produce video where a user who had not scene the source material would not guess that it was upscaled. Even if that means leaving in a certain amount of noise or undesirable effects. Without the Gaia CG model from 1.6, the value proposition of VEAI is gone. Content out of VEAI looks like it was visibly ran through a filter rather rather than looking like it was captured in that resolution.

reiner · February 2, 2021, 7:27pm

I like your reference to Turing

The idea is quite a good one - in fact, if you look at the research of the neural networks involved in video uspcaling, you will find that the comparisons use metrics which try to emulate “human judgement” when compared to ground truth. The “problem” is diverse - these metrics are not perfect - and human reception is subjective - and not all footage can be judged by the singular means of “does it look natural”…

But the goal many try to reach is not far from your motivation

To be a little more practical: The best resulst in terms of “getting als close to ground truth” at the moment is a individualy trained model which represents the usecase… If one wants to upscale Captain Future, he will have best results with a model trained on captain future But of course thats impractical and in many cases not possible (because ground truth does not exist anymore or for other reasons). But sill the approximation in the training closest to the footage worked on is the best chance of “getting close”.

With a program like VEAI we now face the difficulty of VEAI being a commercial product which has to cover as many usecases as possible for as many users as possible (or else not enough people will see it usefull enough to buy it) - so Topaz has to find the find balance of training universal models as commonly of use as possible while still also getting the quality of spezialized models, at least in usecases which (marketing, magic bowl, HR, analys dept, or whoever) is thought to cover as much customers needs as possible (“wow, hundrets are asking for a B&W Model, but tenthousands are asking for VHS - lets do VHS”)…

If we look back at ground-truth comparisons and “how close are we to the original” and the “general judgement” of users “what looks good” - we are fastly running out of any scientifically judgeable area most results which are visibile accepted in the forums and are praised as “better” have all kinds of attributes, but lack a high correlation to a ground truth - quality is so subjective

jim-5876 · February 3, 2021, 10:31pm

Interesting well articulated perspectives on this. I’ve been experimenting with this software and was thrilled to see your discussion as it covered issues I’ve been contemplating.

Working with some SD DV footage of a band in a high contrast small club lighting situation. I found the Gaia CG model as you pointed out, did a nice job of up-rezing of the footage and enhancing the image, but, in my case, it also provided not enough power to cleaning up the footage in the process. Meaning, grain and imperfections were also enhanced and possibly exaggerated beyond the original. Though you could argue that’s “part of the nature of the source material” , I sort of like the idea of minimizing some production limitations of the original capture medium. Not elimination of the character of the footage, just perhaps upgrading a bit if possible.

On the other hand, the Artemis HQ up-rezed & cleaned things up amazingly where needed BUT, the processes effect on skin tones was detrimental…Smoothing out stubble, texture, imperfections, to me, hurt the dimension of the footage.

That said, I think there might be a compromise… What would be GREAT is some kind of slider that would allow some degree of customization of the Artemis HQ model (or any model for that matter) to suit personal taste and differing footage situations. Not necessarily a full set of controls, but perhaps just one or 2 that can tweak the software on those parameters that seem to be most questionable…Smoothing, Human elements, skin tone.