Scene change detection progress

Continuing the discussion from Topaz Video AI v3.2.6:

No, it’s super quick. One of the fastest video operations possible. I’m using a naive mean SAD on the next frame (one pixel diff operation per frame). This gives me SAD values for next frame and previous frames which I can compare against the current. With a very simple and cheap decision tree, it’s able to very accurately detect not only scene changes, but also duplicates. Took me 1h to implement the SAD computation. Training the model took a few more.

If topaz releases this in steps, they could spend 1h doing and testing the SAD calculation, 2h CUDA:fying while another person exposes a static threshold for their FFMPEG plugin. A third person could at the same time create a threshold input field and a toggle in the UI. Total estimated engineering time, including testing and code review: 2 days. Add QA and “mgmt meetings” to that, so multiply by PI. So one FTE week to get this out in a production ready state.

A subsequent release could train a model to (like I did) to detect the thresholds automatically, so zero user threshold configuration would be needed.

Point is, it’s a solved problem, trivial to implement and cheap. Costs almost nothing resource wise to run. It’s clearly just a matter of the Topaz backlog (other priorities) that prevents this from getting done.

3 Likes

@jojje

Well it’s beyond what I can do, though missing frame detection and recreation by interpolation seems to be possible from the GIThub discussion I linked to and that’s my main concern. They suggested that would be slow but may have used a different method of course - and ‘slow’ is relative anyway. Would your method work for that too?*

If it would work for missing frames and you post this information in the Ideas thread, maybe tag one of the devs then hopefully it will get some votes and can be added to the development list, as I think plenty of videos would benefit, from missing frame detection/interpolation especially when producing slow motion video.

Yes, it works for that as well. I implemented it as an avisynth plugin a decade ago.

Was developed to solve this exact problem :slight_smile:

PS. But as with all things avisynth, the plugins aren’t the most user friendly (many parameters and knobs). So am planning to re-implement the idea into a zero-configuration fully automated solution the day I get some free time. Also integrating it upstream directly in ffmpeg seems the best option, since 1) avisynth is windows-only while ffmpeg runs everywhere, and 2) I use ffmpeg anyway as part of all my pipelines so one less dependency (avisynth) would be great.

EDIT: Note the plugin I linked above doesn’t use the algorithm I mentioned (a different tool uses that). It’s only for fixing “missing frames”. The documentation page shows how to use it with another plugin (from another author) to decimate dupes as well. With these two plugins together you get a solution that will get rid of duplicate frames, and interpolate new frames when there is “skippy” motion. The caveat with both plugins is that they assume the dupe & skip frequency is constant within a given window (e.g. within each 25 frame segment). This is true for most CFR clips with these problems but not all clips. Creating a general solution for all cases is the improvement I was looking to tackle above.

Still, this solution that currently exist in the form of these plugins do a heck of a better job than VAI in general in terms of smooth motion. Where VAI trumps is on the very good interpolation quality when it works. I tend to NOT use VAI for interpolation or decimation, since the models fail completely when dupes and skips aren’t evenly spaced. I rather prefer a bit worse interpolation picture quality to juddery motion, and using the plugins gives me the latter.

I’d love to see scene detection capability added as well; My use case is converting 29.97 to 25 fps. I wonder if anyone has tried a CLI approach, where the output of an existing scene detection process is directed downstream for TVAI to process. Does anyone know of a similar example / model?

Seems like it would be no harder than Replace Duplicate Frames, which already exists.