Face Reconstruction (Refinement) for small faces

HAHAHA! The AI can be really creepy and hilarious at the same time, sometimes! :rofl:

1 Like

Before Topaz did the face recovery feature, Remini was really winning. But, now after comparing Remini with Topaz, Remini still wins on faces butā€¦ it doesnt do a good job on the overall picture. So its a trade off, its like do you want a high quality face and low quality background or a good quality face and good quality background.

2 Likes

If the FaceRefinement would implemented in VEAI, then the Algorithm must use a range of Frames from the Video to reconstruct the ā€žreal Facesā€œ, instead of creating only a similar Face.

That means it must analyze some Seconds from the complete Video to collect Informations how the Face really looks :wink::+1:

1 Like

Or a group shot. What a threesome

Doesnā€™t look any worse than what Topaz does.

Thank you @uselessusername12345 for letting me know that both Hitpaw and AVCLab have already implement face enhancement model in their AI video software. :tada: :confetti_ball: :partying_face:
I have just tried a quick test on a very low resolution face to see how it perform.



Left is 4x on VEAI prob, Right is 4x on AVCLab (face refinement enabled).

The face refinement seem did a pretty good job on this low resolution example, it reconstruct detail on eyes & teeth. :100: However, other than the face, I still think VEAI is doing a better job overall. AVCLab tends to over-sharpening the edge and remove too many fine detail. :sweat_smile:

I am looking forward for VEAI to include face enhancement model in the software. :crossed_fingers:

6 Likes

No problem, I also totally agree that VEAI is a better option for general upscaling but having a face refinement option would totally make it the best all around option in the market, anything to do with upscaling low resolution faces just looks scuffed with the current modules but other than that itā€™s great.

4 Likes

It would be better if it just left background faces entirely alone. They donā€™t need to be refined if they are already out of focus in the background.

6 Likes

I was really impressed with my first use of VAI. I was blown away at how much better the video looked. Then, devil faces from hellā€¦ Usually small faces with noise in low light. Some creepy red eyes. Itā€™s almost like watching a music video by Apex Twin xD

1 Like

Holy shit, thatā€™s impressive! What am I bothering with VEAI for then? :stuck_out_tongue:

1 Like

GFPGAN, GPEN, CodeFormer are designed for photo, they donā€™t work well with video. Face Reconstruction for video is lots more challenging than image, because video tends to be heavily compressed and suffer from slow shutter speed, motion blur, artifact, rolling shutter, misfocus, hand shaking, subject movement, etc.
I have tried extract image frames from video and face restoration with GFPGAN, GPEN, CodeFormer. They works well if the subject stay still and look straight to the camera, however those face restoration will fail when the subject is moving or out of focus.

2 Likes

Thanks for the good explanation. I had figured ere the opposite would be true, as with video you can do things like temporal denoising, to literally use a temporal view of the ā€˜imageā€™ over a sequence (thus being able to extract more detail from a face). But you explain it well why that does not work so well on video.

1 Like

For a face so unrecognizable, this result is actually not that bad. Her mouth is way off, though.

I see three ways to approach this. But maybe a combination of these must be used to get the desired effect.

  1. Recreate the face in each frame, using techniques like GFPGAN or something similar. (and make sure there is consistency frame to frame)
  2. Use close up face references from any point in the current video (if they exist) to construct the face in each frame. (Some videos will have the person speaking close the camera at some point and then they move into the background. This data could be used for other portions of the video where the persons face is blurred or low quality)
  3. Feed the program face data, either through images or videos, that will be used to reconstruct the face in the video. (Any video subject is going to have high quality images or video available that can be used as references to recreate the face. Shading and tone will have to be adjusted in Topaz Video AI so that they match the video being processed)

Basically, you can have the system guess to try to fix the face, or you can feed it data. Feeding it data seems like the most accurate way to do it. You would just need to curate a specific amount of images at different angles, or feed it HD video of a face at different angles so that it learns what the face features are. There would have to be some interpretation happening to create angles that donā€™t exist in the face data being fed to the program, as it would be difficult and time consuming to have to include face data that matches every angle or expression in the video you are processing.

I hope some attempt is made at this, because you have to start somewhere. Just like all of these other techniques. They werenā€™t great in the beginning. But they got better and improved over time as they learned.

3 Likes

Just thinking out loud hereā€¦

Something very far down the line could be similar to these AI selfie apps that are blowing up right now. Under the hood they may be using Stable Diffusion and Dreambooth (not 100% sure but letā€™s assume so). Dreambooth lets them fine-tune a SD model for a specific personā€™s face, then they use textual inversion to render images in Stable Diffusion and sell them back to the uploader. This is all open source code, anyone can try it but you need an expensive GPU with 10-12+GB of vram to do locally. You can also do this on Google Colab for free. However, this training takes a long time (10s of minutes to hours), so probably not feasible in the short term for a use case like video restoration by your average user who is very impatient.

The popular app right now (Lensa) is creating fake paintings because that is their goal and they use an artwork-focused SD model for whole-image generation, but the Dreambooth fine tuning could also be done on a realism-focused SD model. This model could then be used to insert the specific characterā€™s face into the very low quality small face in a video. I believe you can already do this on still images in Stable Diffusion by inpainting, with the right settings (low amount of denoising, so the generated image is close to the original).

There is a very real ethical dilemma here though related to enabling easy creation of deepfakes. Imagine a Topaz product of the future lets a user provide a video with a low quality face, then the user can tell it that itā€™s someone else. That would open the door to all kinds of misuse, and is a potential liability for a commercial company like Topaz. This capability to some degree already exists, but itā€™s an underground thing based on open source codes so there is not a commercial stakeholder to be sued.

Edit: browsing more I found this thread discussing similar ideas:

3 Likes

Codeformer was mentioned, but no example? Here is it :slight_smile:

2 Likes

i think its from using something like clip studio paint vector layers.

when you make 1 layer you add in the colors on the face
make the same number of layers as the colors on the face.

do some coding to say the larger colors images are able to be cleaned up and made uniform in appearance.

then take the shapes of the face, the nose the eyes, the mouth, the ears, and make these into image outlines like tracing.

put the tracing in and color them then add the layers together and you have your face.

the hair and ornaments on the person make outlines based on their shape and color trace them out and color the tracing and then add the layers again.

If you are curious how it is done in existing academic tools, it is all very well published. Here are a few papers in reverse order (newest first):

Codeformer: [2206.11253] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

GFP-GAN: [2101.04061] Towards Real-World Blind Face Restoration with Generative Facial Prior

DFDNET: [2008.00418] Blind Face Restoration via Deep Multi-scale Component Dictionaries

Full papers should be available free at the PDF link on the top right.

All three of the above have published on github along with pre-trained models, so you can experiment if you wish. I suspect Topaz Gigapixel AI has an implementation of one of above techniques for its face refinement feature but I have not seen any statements about that. The difficulty in extending this to video is temporal coherence, there is no trivial way to handle this problem. I am sure researchers are working on it.

I had an idea that I wish to share. Google photos (among others) can identify specific people. Would it be possible for the AI to go through the whole video first, to ā€œget to knowā€ the faces and then use that data to improve the faces in the whole video?For example: A closeup of a person turning their head has a lot of information.

Apologies if this approach has already been suggested above or if this is the wrong forum for this.

7 Likes

Iā€™m 100% confident that the Topaz Team can bring this feature over soon, as they did pretty much the same with Gigapixel, so the knowledge is there, they just need to implement a way to carry that technique over to TVAI!

No matter how good GFPGAN may be at reconstructing faces, you can bet solid money that the Topaz Team will surpass GFPGANā€™'s progress, as theyā€™re constantly working on improving the software!

A program is only good by how dedicated the development team are!

1 Like