Feature Request: AI Model You Can Train With Your Own Photoset or Videos - to restore missing detail in videos correctly

UPDATE 2 - VEAI Model

Blending my idea with Kyle’s & Khan’s together in a more simplified way, here it is below:

I would want VEAI to be able to do this:

  1. Automatically detect the “background” or manually select it
  2. Automatically detect “people” as separate individuals or manually select it
  3. Automatically detect “animals” (is optional) or manually select it
  4. Ability to “select the AI model” of your choice for each detection or selection.
  5. You can choose an AI model you have trained for that specific detection or selection
  6. Users can make a custom AI model that stacks off of VEAI models that have already been created and can be used as a base to make custom trained models or start from scratch.
  7. Name our custom trained models
  8. Recognition of individuals based on custom trained AI model datasets and uses those custom trained AI model datasets.
  9. Progress bar of custom trained model, to let us know if the video footage or pictures we used have exceeded their use case or not or the model still needs x amount of time to train off them.
  10. Ability to share AI models with the community that we built with the topazlabs software, this would decrease the amount of work on Topazlabs and the passion of the community will help bring in unique AI models that work best for certain things. It would be cool if we could even collaborate on these AI models and make them better and Topaz labs can work with the community as well to more fine tune things since they have the programming background.

In practice, say you have high quality photos of your favorite sports team, person, or scenery, but the video footage is lacking or missing details. You can easily select an AI model you have trained for the detection or selection.

Example 1: A person’s face or quality is distorted, you have a trained model for that person’s face or quality to be restored with the correct detail. So say normally if you were to upscale, the smile will be upscaled with the distortion. Instead of that, the correct detail is added as well as being upscaled.

Example 2: The background is missing detail, say leaves don’t appear distinct and blurred, well you can use an AI model you trained for that specific scenery to restore the detail.

Kyle’s idea bout a broad scope AI you can train in general still has some benefits even without the detection and selection thing I mentioned, as he currently uses ESRGAN to do this now for sports videos. It would be nice to have VEAI a paid app simplify this for it’s userbase.

-----MY ORIGINAL IDEA------
Feature Request: AI that recognizes distinct people and or background, allows high quality input data of those individual people or setting, in order to “correctly add detail” to the subject/enviroment that is in low quality (While only storing this data on an individuals computer not uploading it to the cloud).

For example, some faces in low quality videos will be upscaled and look more like rough painting of the person, the eyes don’t look the same, nose, hands, arms, even clothes at times, sometimes the skin gets too smooth and looks unreal.

Solution in practice: The AI model recognizes a person being distinct of other persons. You are prompted to add data for that AI model to work with. So say you have a low quality video of a person but high quality pictures of that person from the same event. You can use the images to upscale with the “correct details” of that person or also if you have generally high quality images of the person you can allow the AI to train into learning that individual and then properly add the correct detail to the video. So lets say person is far away, the image is blurry there face is distorted. Instead upscaling the distortion, correctly add the detail of the face with this model by allowing the user to add input data to help the AI correctly add detail for that specific person in the video. It would be cool is if you could do this for multiple people at a time. But even one person at a time would help a lot with making this product extremely helpful and valuable to restoring videos with low details.

One last note, imagine having a video where the smile of a person is distorted, but you have the correct smile of that person in a high quality picture. The AI model then can give the correct smile rather than exaggerating the distortion. It would be great to be able to add details into a video that arent there based on user input data the AI can work with.

The use case above may be trickier to solve than the broader use case - which would be allow for the development/tuning of custom models. This is the main advantage that I have using open source over VEAI - I can train my own ESRGAN model for the dataset I’m dealing with. I see a potential user flow that looks like the below:

  1. User clicks “Train New Model” from the drop down menu.
  2. User chooses an AI model from the existing AI models that serves a starting point.
  3. User chooses one or many high-quality training videos.
  4. User chooses a name for the new model and gets a progress bar as the training/fine-tuning happens.

It would be an interesting approach for those of us that have high quality source footage of a certain genre. I’ve been upscaling older 720p college football games to 1440p or 4k using VEAI - it tends to do well on the close-ups / graphics, but chokes on the wide field shots of the game. I have tons of high quality footage from newer games that could “train” a model on this tricky use case, so it would be a very interesting feature to see implemented.

5 Likes

I like this idea too. To blend yours and my idea and to simplify what I said above.

I would want VEAI to be able to do this:

  1. Automatically detect the “background” or manually select it
  2. Automatically detect “people” as separate individuals or manually select it
  3. Automatically detect “animals” (is optional) or manually select it
  4. Ability to “select the AI model” of your choice for each detection or selection.
  5. You can choose an AI model you have trained for that specific detection.
  6. Users can make a custom AI model that stacks off of VEAI models already created for their training or start from scratch.

In practice, say you have high quality photos of your favorite sports team, person, or scenery, but the video footage is lacking or missing details. You can easily select an AI model you have trained for the detection or selection.

The broader scope you mentioned still has some benefits as even having just an overall personally trained AI model for what your using has its benefits. Like you trained an AI model to work specifically for sports and that’s cool. It would be nice if we could train an AI Model in general. As that would allow us to do all kinds of things. Like say we have a person from a distance and their face is distorted, we could train an AI model and name it specifically for that use case.

1 Like

Up for this idea, would be great to for community to create training models for you in the category they are interested in. For example I’d be happy to try with enhancing low fidelity video in soccer.

2 Likes

This is not a good idea for a simple reason : it will not be possible. do you guys are aware of the power and time and the kind of machine it’s needed to train such models in VEAI to make it working ? try such a software available on the web, which allow you to train your own models, just for face, and you’ll see that it take days and days for a very tiny result. so imagine how much it’s needed for scene with different people, landscape ,animals, objets etc.
On the paper of course it’s a good idea; but in term of time, machine, gear, it’s not possible or at least not actually with the actual gear available to normal users.

For sports in big field, there’s no need for detail facial, but making high fidelity of edges and shapes of moving people. E.g. in low-res football clip, most of the time camera is zoom out over the field and players are very blurry. So if we can train to reconstruct these moving players body in higher fidelity, it’s already 80% value.

1 Like

That would be cool if we could like share models with the community we created for a specific niche.

Yup, that would be nice to have an AI trained model for soccer videos.

1 Like

I have trained an AI model with open source software before and it didn’t take days, it was like about the same time as it takes VEAI to upscale a video, many hours…

Also, even if it did take days or even weeks, I am still okay with that. To be able to have the ability to have custom trained AI models that are more accurate then the current models or serve a specific niche would be worth it. Some people could make AI models for a specific person, or scenery, or sport and I think many people wouldn’t mind the extra hours to have a much higher quality result. Plus it would get the VEAI community more involved and decrease the workload for Topazlabs if the community could share their models if that was an added feature. Then topazlabs could focus on bug fixes and working with the community to produce even higher end AI models. I mean, if one doesnt want to share their AI model thats fine too. Just another idea. Plus, if we could share AI models, then people in the community could collaborate.

2 Likes

I agree

i’m agree, also if i need weeks to train my models, is not a problem, i came from 3d rendering, 32 years of 3d rendering experience, to me weeks of render is a common deal, a small renderfarm at home, at office, is ok, i’m use to wait if quality gain is the result. in future we would have realtime also with this, it’s ok, i saw 3d realtime 20 years ago, and today only with unreal we have it for a not tech guy, well now i’m ready to teach to my ai with tons of photo and video how to do better his work.
i often upscale videos that i can enrich with million of photo at highres, or sometimes also with video, why not add more infos to AI?

3 Likes

Impressive, yeah I don’t have anywhere near the level of experience you have. But, that’s really cool to read your story and your take regarding training an AI Model.

But yup, I am willing to wait weeks as well for a much higher quality AI.

you are very kind, we all start from zero, i just started a bit time before others, i know people that start before me :smiley:

1 Like

This all depends on the user, what hardware they have, and how long they’re willing to wait.

3 Likes

I was wondering the same thing today… It would be fantastic if Video AI had a feature of a “two-pass” upscaling.

In the first pass, it detects faces and their identities (say, all main actors in an old movie), and for each identity stores the location of frames with maximum face size and different poses.

At the end of the first pass, it extracts those frames (possibly with an interaction step where the user can sub-select frames to avoid poor training data). A maximum of N (say 30) frames will be extracted for each of C characters/people (say, max. 12), and will then be processed in the cloud to train an additive model that, similar to LoRA in stable diffusion allows to encode these people’s (as high-resolution as possible) facial features, so that later face recovery can do a better job during the second (upscale) pass…

3 Likes