HAHAHA! The AI can be really creepy and hilarious at the same time, sometimes!
Before Topaz did the face recovery feature, Remini was really winning. But, now after comparing Remini with Topaz, Remini still wins on faces butā¦ it doesnt do a good job on the overall picture. So its a trade off, its like do you want a high quality face and low quality background or a good quality face and good quality background.
If the FaceRefinement would implemented in VEAI, then the Algorithm must use a range of Frames from the Video to reconstruct the āreal Facesā, instead of creating only a similar Face.
That means it must analyze some Seconds from the complete Video to collect Informations how the Face really looks
Or a group shot. What a threesome
Doesnāt look any worse than what Topaz does.
Thank you @uselessusername12345 for letting me know that both Hitpaw and AVCLab have already implement face enhancement model in their AI video software.
I have just tried a quick test on a very low resolution face to see how it perform.
Left is 4x on VEAI prob, Right is 4x on AVCLab (face refinement enabled).
The face refinement seem did a pretty good job on this low resolution example, it reconstruct detail on eyes & teeth. However, other than the face, I still think VEAI is doing a better job overall. AVCLab tends to over-sharpening the edge and remove too many fine detail.
I am looking forward for VEAI to include face enhancement model in the software.
No problem, I also totally agree that VEAI is a better option for general upscaling but having a face refinement option would totally make it the best all around option in the market, anything to do with upscaling low resolution faces just looks scuffed with the current modules but other than that itās great.
It would be better if it just left background faces entirely alone. They donāt need to be refined if they are already out of focus in the background.
I was really impressed with my first use of VAI. I was blown away at how much better the video looked. Then, devil faces from hellā¦ Usually small faces with noise in low light. Some creepy red eyes. Itās almost like watching a music video by Apex Twin xD
Holy shit, thatās impressive! What am I bothering with VEAI for then?
GFPGAN, GPEN, CodeFormer are designed for photo, they donāt work well with video. Face Reconstruction for video is lots more challenging than image, because video tends to be heavily compressed and suffer from slow shutter speed, motion blur, artifact, rolling shutter, misfocus, hand shaking, subject movement, etc.
I have tried extract image frames from video and face restoration with GFPGAN, GPEN, CodeFormer. They works well if the subject stay still and look straight to the camera, however those face restoration will fail when the subject is moving or out of focus.
Thanks for the good explanation. I had figured ere the opposite would be true, as with video you can do things like temporal denoising, to literally use a temporal view of the āimageā over a sequence (thus being able to extract more detail from a face). But you explain it well why that does not work so well on video.
I see three ways to approach this. But maybe a combination of these must be used to get the desired effect.
- Recreate the face in each frame, using techniques like GFPGAN or something similar. (and make sure there is consistency frame to frame)
- Use close up face references from any point in the current video (if they exist) to construct the face in each frame. (Some videos will have the person speaking close the camera at some point and then they move into the background. This data could be used for other portions of the video where the persons face is blurred or low quality)
- Feed the program face data, either through images or videos, that will be used to reconstruct the face in the video. (Any video subject is going to have high quality images or video available that can be used as references to recreate the face. Shading and tone will have to be adjusted in Topaz Video AI so that they match the video being processed)
Basically, you can have the system guess to try to fix the face, or you can feed it data. Feeding it data seems like the most accurate way to do it. You would just need to curate a specific amount of images at different angles, or feed it HD video of a face at different angles so that it learns what the face features are. There would have to be some interpretation happening to create angles that donāt exist in the face data being fed to the program, as it would be difficult and time consuming to have to include face data that matches every angle or expression in the video you are processing.
I hope some attempt is made at this, because you have to start somewhere. Just like all of these other techniques. They werenāt great in the beginning. But they got better and improved over time as they learned.
Just thinking out loud hereā¦
Something very far down the line could be similar to these AI selfie apps that are blowing up right now. Under the hood they may be using Stable Diffusion and Dreambooth (not 100% sure but letās assume so). Dreambooth lets them fine-tune a SD model for a specific personās face, then they use textual inversion to render images in Stable Diffusion and sell them back to the uploader. This is all open source code, anyone can try it but you need an expensive GPU with 10-12+GB of vram to do locally. You can also do this on Google Colab for free. However, this training takes a long time (10s of minutes to hours), so probably not feasible in the short term for a use case like video restoration by your average user who is very impatient.
The popular app right now (Lensa) is creating fake paintings because that is their goal and they use an artwork-focused SD model for whole-image generation, but the Dreambooth fine tuning could also be done on a realism-focused SD model. This model could then be used to insert the specific characterās face into the very low quality small face in a video. I believe you can already do this on still images in Stable Diffusion by inpainting, with the right settings (low amount of denoising, so the generated image is close to the original).
There is a very real ethical dilemma here though related to enabling easy creation of deepfakes. Imagine a Topaz product of the future lets a user provide a video with a low quality face, then the user can tell it that itās someone else. That would open the door to all kinds of misuse, and is a potential liability for a commercial company like Topaz. This capability to some degree already exists, but itās an underground thing based on open source codes so there is not a commercial stakeholder to be sued.
Edit: browsing more I found this thread discussing similar ideas:
Codeformer was mentioned, but no example? Here is it
i think its from using something like clip studio paint vector layers.
when you make 1 layer you add in the colors on the face
make the same number of layers as the colors on the face.
do some coding to say the larger colors images are able to be cleaned up and made uniform in appearance.
then take the shapes of the face, the nose the eyes, the mouth, the ears, and make these into image outlines like tracing.
put the tracing in and color them then add the layers together and you have your face.
the hair and ornaments on the person make outlines based on their shape and color trace them out and color the tracing and then add the layers again.
If you are curious how it is done in existing academic tools, it is all very well published. Here are a few papers in reverse order (newest first):
Codeformer: [2206.11253] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
GFP-GAN: [2101.04061] Towards Real-World Blind Face Restoration with Generative Facial Prior
DFDNET: [2008.00418] Blind Face Restoration via Deep Multi-scale Component Dictionaries
Full papers should be available free at the PDF link on the top right.
All three of the above have published on github along with pre-trained models, so you can experiment if you wish. I suspect Topaz Gigapixel AI has an implementation of one of above techniques for its face refinement feature but I have not seen any statements about that. The difficulty in extending this to video is temporal coherence, there is no trivial way to handle this problem. I am sure researchers are working on it.
I had an idea that I wish to share. Google photos (among others) can identify specific people. Would it be possible for the AI to go through the whole video first, to āget to knowā the faces and then use that data to improve the faces in the whole video?For example: A closeup of a person turning their head has a lot of information.
Apologies if this approach has already been suggested above or if this is the wrong forum for this.
Iām 100% confident that the Topaz Team can bring this feature over soon, as they did pretty much the same with Gigapixel, so the knowledge is there, they just need to implement a way to carry that technique over to TVAI!
No matter how good GFPGAN may be at reconstructing faces, you can bet solid money that the Topaz Team will surpass GFPGANā's progress, as theyāre constantly working on improving the software!
A program is only good by how dedicated the development team are!