Is it possible to increase quality of video based on image(s) via Machine Learning?

I would like to know is it possible in today days via ML, or it’s still need unhuman gpu powers.
In example we have a video: Yt1s.com - Marina And The Diamonds Starring Role New Song Live Little Noise Sess... | Gfycat

And we want increase details on the video (better face, clothes textures etc). Can we grab that info from this images and mix with pixels in video and get quality upgrade?

It’s not an upscale, it’s “pixel transfer from quality donor” i would say. Is it possible in today? And how hard is it?
Original vid frame:

DONORS:




2 Likes

A very interresting enhancement proposal.

1 Like

Yeah, not even close. Arguably, VEAI doesn’t even use real A.i. It uses statistical analysis (which counts as A.i. these days, for marketing purposes). Aka, you feed an alleged A.i millions of different images, and then it will start learning how to recognize things like faces in those. Useful, but not truly intelligent.

Now, every A.i will use statistics, but not every statistics is A.i. Humans can look at 2 completely different photos (like those the OP posted), extract the human in question, in our brain, and realize it’s the same person. VEAI can’t. The latter, at best, can recognize a face, and more-or-less try and extrapolate what it’s supposed to look like (when at very low res). Aka, it recognizes a face, not the same face it may have seen in other photos. Let alone recognize other body parts belonging to that same person in another image, and use those to repair the primary footage.

That’s sadly. What about swapface technique that deepfaking person face? Can we deepfake with the same person, but increase quality of face?

I suppose it can be done, but you’d need a (huge) personalized data set to train the A.i. on. Maybe that is something VEAI will offer in the future, but I doubt it.

What a bizarre tangent to go off on, LOL. You’re also simply very wrong. The regular meaning of ‘intelligence’ is simply as found under 1.1 (see below).

intelligence:

1. 1.

the ability to acquire and apply knowledge and skills.

“an eminent man of great intelligence”

2. 2.

the collection of information of military or political value.

“the chief of military intelligence”

As William Blake put it so well: “The fool persists in his folly.”

Intelligence comes from the Latin intelligere. Which (loosely) translates to ‘The ability to connect things.’ Aka, understand.

Also, the ‘skills’ an A.i acquires have absolutely nothing to do with computing power (that’s just hardware), but with its ability to learn as a neural network.

actually, what you are proposing is the principal on how stuff like this works…

But: Its not as far as your exact example is wishing for… But it will get there eventually.

Several approaches can come in handy:

  • training models on certain people (research is alread gone quite far on this one), so having a database of the people in some footage could aid to enhance the faces…
  • creating 3D models of a very few pictures automatically can help to get motion consistency right, add lost details in some cases, get rid of compression artefacts, etcc…
  • super resolution - already is implemented in quite a few scenarios - combining several pictures in time to extract information in space (been around for decades actually, also see astronomer software

etc…

It’s good to hear you say they’re further with this than I thought. :slight_smile:

Here’s to hoping VEAI will come with a tool (or do it itself) to use our own data sets to train for people’s faces. As faces really remain the Achilles heel of VEAI.

Your definition of intelligence is only one - many English words have more than one meaning. If you are intelligent that means you have intelligence. An IQ test measures your intelligence quotient. Going forward you should probably refrain from picking fights over grammar with native English speakers because you’ll lose every time. And yes it’s “lose” not “loose” as you said earlier.

one important thing to point out:

its “reseach” in generall, not Topaz working on things like that :slight_smile:

it runs an inference engine with a model, that was trained on a neural network.

1 Like

VEAI itself does not use a neural network, of course (not the .exe we’re using). The machines they train their models on, naturally do. So, with VEAI we get exactly what they advertise with: “Unlimited access to the world’s leading production-grade neural networks for video upscaling.”

Ah, thx for the clarification. Give it another 10 years or so, and who knows, we may see a brave new A.i. world where all of this has become a reality. :slight_smile:

MY personall guess would be less than 10 years…

I have my doubts. Even though A.i is growing rapidly, recognizing the same person in 2 different photos, and be able to ‘repair’ one with parts gotten from the other, does go quite a bit further than ‘simple’ statistics. But my primary thing is, that it will be extremely computational, and will require several temporal passes, at the very least. Like temporal denoising, but endlessly more complex, like going thru the movie X many times, trying to extract the same people (not just 1 person), from all different angles and such, they appear in. And people are already complaining VEAI takes so long to complete. :slight_smile: And another thing, of course, is that this sort of A.i. would have to be done locally, and not merely pre-trained (although, to a certain degree, it can be).

So, I give that kind of functionality maybe 25 years, even.

we`ll se… Facial recognition is already a thing for quite some time now - to identify people and objects in video footage is very possible…
And generating 3D counterparts of faces and even complete scenes with shadows, etc… also is a reality now.

The difficulty is to get it going in a fidelity way to give cinematic grade quality.

Yes, it is. But what the OP wants to do goes quite a bit further: he wants to recognize faces/body parts from a fuzzy (low-res) part of the video, and then use footage from other parts of the video to repair faces/body parts of the former. Like I said, not only would that take any number of temporal pre-passes, but would also require the software to repair the fuzziness with new, spliced-in parts (corrected for scale/angle, etc) from other footage inside the video. Like a fuzzy dress seen somewhere at 10 minutes into the movie, which then re-appears 16 minutes later somewhere, seen in higher quality. I cannot fathom how time-consuming such a process would be. And, remember, the time doing this for just 1 person would already be staggering, but far more so when you basically want all characters seen get the same treatment.

But, indeed, we’ll see. :slight_smile:

Maybe if we change the approach to something more like using AI to recreate the scene in a 3D engine, then render that as the movie. To me that seems more possible, but still years out.

1 Like