How many neighboring frames analyzed per frame? Configurable?

I know VEAI uses temporal information (neighboring frames). Does anyone know how far ahead/behind it looks? And can this be configured?

Let’s say a section of the video has artifacts, blurring, or compression but later you can see things clearly. I would expect VEAI can use that information to understand what is missing and recover the information. But what if the clear frames are seconds or minutes later in the video? Can this be configured to analyze the entire video, or a minute ahead/behind in order to increase recovery potential?

1 Like

From what I have observed, it appears that VEAI does look at 20-30 frames prior, but future frames don’t have any impact.

What has led me to this conclusion is that I have been letting VEAI export the frames as .png and then closely comparing the results with different settings. I began to notice though that what I was seeing when exporting individual frames for comparison was not actually the same as what I would get if I looked at the same frame as part of a longer export with many frames either side.

So I did some brief testing and noticed that if I export 30 frames prior to the frame I’m interested in, I get the same result as if I exported the full video. But if I only export 20 frames prior, I get a different result. And reducing that number of prior frames results in yet different results each time. The differences are quite minor, but they are there. Sometimes they can be quite noticeable too, so it’s something to bear in mind if you are really trying to fine tune your parameters.

I also tried seeing what happened if I exported 30 frames either side vs just 30 frames prior, and there was no difference, which leads me to the conclusion that later frames have no impact, only those before.

This was with Proteus Fine Tune V3 upscaling from 1080p to 4K. I haven’t tested with any other models.

2 Likes

It would be nice if the program could analyze the entire video to pull information from it to enhance certain details. For example, if the video starts with someone speaking into the camera in a closeup shot and then they walk into the background and turn around, the ai should know what the high resolution face should look like, from the intro and then apply that knowledge to future frames in the video. Is this asking too much?

Another example would be a closeup of a camera on the wall. Then if the camera backs away from the wall, it should be able to still show the clock in high resolution because it knows what it should look like in high detail.

This is possible. Probably harder to setup and train, but not unrealistic for the future. Sadly, it could be several years before things develop that much.

One more aspect it should take into account: When someone/something is shown far away, then moves in up close. Basically it would probably need two passes, one going normal and one in reverse.