I’ve always thought that that was likely the only way to properly increase resolution: By finding the best info about a given character that’s available in a file from different angle, upscaling versions of that, and using that to construct a general reference model for how each character should look so that you can optimize as relevant even in scenes where there’s less detail.
For a TV series, you may be able to get even more references, if you stored them in a shared model that can be applied to that entire series (at least relevant during training).
I would imagine that GPT-4V could help identify all the places in a video that have a given person (by creating a description of features, clothes etc. and then looking for matches). And you could use that data to optimize a model of your own for finding characters in a video, as viewed from different angles.
I tried just upscaling a cartoon as my first try because I assumed that that would be less risky than a lower res movie. And where some scenes really came out impressive, a lot of places have somewhat mangled faces, or the character just doesn’t look like himself after processing.
But for the time being, I think some alternate solution is needed for the parts of the picture that don’t have enough detail, and that the model can’t figure out.
I think I’ve seen something like blurring in AI enhanced videos - I don’t remember what exactly - but something that makes things somehow look okay at higher res, even though some details couldn’t be figured out.
Maybe you can integrate some Human Feedback into the training of your models such that even if it can’t figure out how to upscale something, that the result still doesn’t look bad.