I’m skeptical about Project Starlight. Although the Topaz Labs team has laid out grand ambitions, this project is unlikely to deliver a fundamental solution. The Rhea model was widely touted for its impressive performance, yet it ultimately failed to resolve the artifacts that degrade image quality and cause unnaturalness. Project Starlight seems poised to become little more than a mash‑up of models like Rhea XL and Theia and it will likely remain unable to eliminate these issues in low‑quality footage. What we truly need is a deep‑learning approach precisely tailored to the unique characteristics of each video, decisively reducing both artifacts and unnaturalness. For any residual problems that deep learning can’t fix, there should also be an option for manual correction, whether scene by scene or even frame by frame.
Though Starlight is interesting, it has turned out to be the same amount of bad when it comes to trying to enhance my family VHS videos.
With some videos, I don’t think even deep learning on them will be enough to produce something detailed and true to life. In those cases, I would be fine with it producing something that fits the scene but purely made from the high quality training videos.
For example: A kid walking through a forest in the distance. I don’t need the kid to be the same one shown up close in another scene. I just need it to look like a normal kid at that distance, and the forest to look like a forest and not a bunch of strings and spots.
I think about this often and am not sure what can really be done at this point. Of course what I really want is the video recording in the original Star Trek where they are somehow able to view the recording from any angle or distance—so a model that recreates a 3D model of the scene, and fills it with all the details it can, then hands it off for human intervention or final rendering.
That’s why I recently just sorted cheapest PCs with 16GB VRAM, eshop spits out PC with 4060Ti, poor thing doesn’t know yet it has to work 24/7.
I think your suggestion is from 2027.
That’s not what I meant. I’m criticizing Project Starlight because the Topaz Labs team talks as if it’s a fundamental solution. We know that it’s impossible to solve both low-quality artifacts and the unnaturalness that arises from fixing those artifacts using only AI models. If we were to solve all of this purely with AI models, it might not be possible even by 2027 in fact, it might take more than 10 years from now to reach that point.
What I’m trying to say is that, at this point in time, what’s needed is not just an AI model, but also human intervention. Deep learning is just one example. Of course, deep learning alone can’t solve everything, which is why I think we need advanced features that allow for human involvement alongside AI.
I agree and disagree. I don’t think human involvement is necessary when the reasoning of the model is set smart.
I mean there is this guy, let’s call him John, and in a first scene he walks on a street, just reached the town. Clear shots on his face. Then later John is running through forest, setting traps to get cops, but his face just gets blurry, but it’s still John, you know, not some kind of mummy. But the AI doesn’t save actors into memory.
There are already problems with recognizing only heads, of John..McClane, so it would be useful if the user could mark things on the image, for example, to say this background should be less sharp or please remove more noise here, if it could learn from what you tell it, then a big leap forward would be possible.
What about some sort of continually trained AI model? Just like how AI picture recognition needed hundreds of people to describe the training images in text to make the training data. What I’m thinking of is a tool in the UI that lets users correct things like what face was detected and how much it should be restored.
Never mind. The more I think about this the more I don’t really see it being okay for a lot of users. It would mean that all the user videos would become part of the training data.
That issue aside, thinking about even something more simple like making Proteus Auto actually useful on DVDs by using the same approach—basically using the parameters the users set to better train the auto mode. Anyway, I doubt every user has the same idea of what “best” looks like. The ‘training’ data would become conflicting and useless really fast.