Face Reconstruction (Refinement) for small faces

Holy shit, that’s impressive! What am I bothering with VEAI for then? :stuck_out_tongue:

1 Like

GFPGAN, GPEN, CodeFormer are designed for photo, they don’t work well with video. Face Reconstruction for video is lots more challenging than image, because video tends to be heavily compressed and suffer from slow shutter speed, motion blur, artifact, rolling shutter, misfocus, hand shaking, subject movement, etc.
I have tried extract image frames from video and face restoration with GFPGAN, GPEN, CodeFormer. They works well if the subject stay still and look straight to the camera, however those face restoration will fail when the subject is moving or out of focus.

2 Likes

Thanks for the good explanation. I had figured ere the opposite would be true, as with video you can do things like temporal denoising, to literally use a temporal view of the ‘image’ over a sequence (thus being able to extract more detail from a face). But you explain it well why that does not work so well on video.

1 Like

For a face so unrecognizable, this result is actually not that bad. Her mouth is way off, though.

I see three ways to approach this. But maybe a combination of these must be used to get the desired effect.

  1. Recreate the face in each frame, using techniques like GFPGAN or something similar. (and make sure there is consistency frame to frame)
  2. Use close up face references from any point in the current video (if they exist) to construct the face in each frame. (Some videos will have the person speaking close the camera at some point and then they move into the background. This data could be used for other portions of the video where the persons face is blurred or low quality)
  3. Feed the program face data, either through images or videos, that will be used to reconstruct the face in the video. (Any video subject is going to have high quality images or video available that can be used as references to recreate the face. Shading and tone will have to be adjusted in Topaz Video AI so that they match the video being processed)

Basically, you can have the system guess to try to fix the face, or you can feed it data. Feeding it data seems like the most accurate way to do it. You would just need to curate a specific amount of images at different angles, or feed it HD video of a face at different angles so that it learns what the face features are. There would have to be some interpretation happening to create angles that don’t exist in the face data being fed to the program, as it would be difficult and time consuming to have to include face data that matches every angle or expression in the video you are processing.

I hope some attempt is made at this, because you have to start somewhere. Just like all of these other techniques. They weren’t great in the beginning. But they got better and improved over time as they learned.

3 Likes

Just thinking out loud here…

Something very far down the line could be similar to these AI selfie apps that are blowing up right now. Under the hood they may be using Stable Diffusion and Dreambooth (not 100% sure but let’s assume so). Dreambooth lets them fine-tune a SD model for a specific person’s face, then they use textual inversion to render images in Stable Diffusion and sell them back to the uploader. This is all open source code, anyone can try it but you need an expensive GPU with 10-12+GB of vram to do locally. You can also do this on Google Colab for free. However, this training takes a long time (10s of minutes to hours), so probably not feasible in the short term for a use case like video restoration by your average user who is very impatient.

The popular app right now (Lensa) is creating fake paintings because that is their goal and they use an artwork-focused SD model for whole-image generation, but the Dreambooth fine tuning could also be done on a realism-focused SD model. This model could then be used to insert the specific character’s face into the very low quality small face in a video. I believe you can already do this on still images in Stable Diffusion by inpainting, with the right settings (low amount of denoising, so the generated image is close to the original).

There is a very real ethical dilemma here though related to enabling easy creation of deepfakes. Imagine a Topaz product of the future lets a user provide a video with a low quality face, then the user can tell it that it’s someone else. That would open the door to all kinds of misuse, and is a potential liability for a commercial company like Topaz. This capability to some degree already exists, but it’s an underground thing based on open source codes so there is not a commercial stakeholder to be sued.

Edit: browsing more I found this thread discussing similar ideas:

3 Likes

Codeformer was mentioned, but no example? Here is it :slight_smile:

2 Likes

i think its from using something like clip studio paint vector layers.

when you make 1 layer you add in the colors on the face
make the same number of layers as the colors on the face.

do some coding to say the larger colors images are able to be cleaned up and made uniform in appearance.

then take the shapes of the face, the nose the eyes, the mouth, the ears, and make these into image outlines like tracing.

put the tracing in and color them then add the layers together and you have your face.

the hair and ornaments on the person make outlines based on their shape and color trace them out and color the tracing and then add the layers again.

If you are curious how it is done in existing academic tools, it is all very well published. Here are a few papers in reverse order (newest first):

Codeformer: [2206.11253] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

GFP-GAN: [2101.04061] Towards Real-World Blind Face Restoration with Generative Facial Prior

DFDNET: [2008.00418] Blind Face Restoration via Deep Multi-scale Component Dictionaries

Full papers should be available free at the PDF link on the top right.

All three of the above have published on github along with pre-trained models, so you can experiment if you wish. I suspect Topaz Gigapixel AI has an implementation of one of above techniques for its face refinement feature but I have not seen any statements about that. The difficulty in extending this to video is temporal coherence, there is no trivial way to handle this problem. I am sure researchers are working on it.

I had an idea that I wish to share. Google photos (among others) can identify specific people. Would it be possible for the AI to go through the whole video first, to “get to know” the faces and then use that data to improve the faces in the whole video?For example: A closeup of a person turning their head has a lot of information.

Apologies if this approach has already been suggested above or if this is the wrong forum for this.

7 Likes

I’m 100% confident that the Topaz Team can bring this feature over soon, as they did pretty much the same with Gigapixel, so the knowledge is there, they just need to implement a way to carry that technique over to TVAI!

No matter how good GFPGAN may be at reconstructing faces, you can bet solid money that the Topaz Team will surpass GFPGAN’'s progress, as they’re constantly working on improving the software!

A program is only good by how dedicated the development team are!

1 Like

I agree 100%. The Topaz team is doing a fantastic job. That said, there is so much AI today that it has become a real jungle. And this is just the beginning. At least, they have many choices and opportunities to improve their products, but it’s hard to find the right AI to fix or improve the face, the hair, the skin, etc. It’s a real pain.

1 Like

Yeah, even when Gigapixel first came out with face reconstruction model, it would work miracles on the face, yet the hair, etc was not as clear as the face, but they got past that roadblock, just as they’ll end up transferring that tech over to TVAI.

1 Like

Yes, I remember that. You’re right. Speaking of hair, it’s way better with Gigapixel 6.3.2! It’s not quite there yet, and it’s not perfect, of course, but they seem finer and don’t have that dirty or “greasy” look. And most importantly, with less sharpening and contrast. Hair is always a real problem for every AI. I don’t know a specific AI capable of doing a fantastic job with hair. Except, maybe, Photoshop with the neutral “Photo restoration” BETA filter (Photo enhancement and Enhance face) to fix the weird “greasy” look with the hair and to look finer here and there, just a tiny bit, but that’s pretty much it.

1 Like

Remini’s latest update (on Android at least) now has amazing face-focused video enhancement! It’s much better than the AVCLabs video enhancer in terms of faces, but it’s not perfect, but it’s the best I’ve seen. It still has visible “enhancement focus area” squares around each face in certain settings. It is also limited to <60 seconds and <60 mb, but their progress in face video enhancement is very welcome in a future version of Topaz Video AI, especially if it includes an opacity slider like in Photo AI.

3 Likes

Definitely, I’d love to be able to enhance more of my customer’s films but unless the subject is close to the camera it converts them into hideous creatures, so for the moment TVAI can only be used on footage with no people in it for compressed video. It works much better on my cine film transfers, however they are much higher resolution and far less compression.

1 Like

Only just found this thread.

Re: GFPGAN, GPEN, CodeFormer are designed for photo, they don’t work well with video.

Downloaded CodeFormer to try!
Your AVCLabs image upscale example looks good.

I used AVC !! stopped using it quickly the current face construction in the beta is miles better than avc which asks are their few or many faces then runs then crashes till a fix is available then it is slow and the license has expired chat gpt in 3.0 liked the software since chat gpt 4.0 is available it has been removed from AI video software chatgpt does learn quickly and yes topazlabs is highly advised by chatgpt 4.0

AVCLabs is slow but better then Topaz, thanks for encouragement to try it buddy :slight_smile:

Good no Topaz is because AVC use only I so now, Eskimo to ice cream sell he where instagram on face his see clown I like.

Topaz then better much is it find and AVC try I best is Topaz that user real from user I hear now, well very work it and face fix could Topaz so down ran him clown chase I computer with ran but AVC use was good the quality of topaz with processed and clown a film used.

You real user and Topaz hit a new low.
Just sad ru and topaz.