I like your reference to Turing 
The idea is quite a good one - in fact, if you look at the research of the neural networks involved in video uspcaling, you will find that the comparisons use metrics which try to emulate “human judgement” when compared to ground truth. The “problem” is diverse - these metrics are not perfect - and human reception is subjective - and not all footage can be judged by the singular means of “does it look natural”…
But the goal many try to reach is not far from your motivation 
To be a little more practical: The best resulst in terms of “getting als close to ground truth” at the moment is a individualy trained model which represents the usecase… If one wants to upscale Captain Future, he will have best results with a model trained on captain future
But of course thats impractical and in many cases not possible (because ground truth does not exist anymore or for other reasons). But sill the approximation in the training closest to the footage worked on is the best chance of “getting close”.
With a program like VEAI we now face the difficulty of VEAI being a commercial product which has to cover as many usecases as possible for as many users as possible (or else not enough people will see it usefull enough to buy it) - so Topaz has to find the find balance of training universal models as commonly of use as possible while still also getting the quality of spezialized models, at least in usecases which (marketing, magic bowl, HR, analys dept, or whoever) is thought to cover as much customers needs as possible (“wow, hundrets are asking for a B&W Model, but tenthousands are asking for VHS - lets do VHS”)…
If we look back at ground-truth comparisons and “how close are we to the original” and the “general judgement” of users “what looks good” - we are fastly running out of any scientifically judgeable area
most results which are visibile accepted in the forums and are praised as “better” have all kinds of attributes, but lack a high correlation to a ground truth - quality is so subjective 