[Show & Tell] Speeding up processing by removing duplicates before enhancing

I’ve been playing with mpdecimate (dedupe) as a mechanism for speeding up TVAI, as well as reducing shimmer between duplicate frames by decimating before the TVAI enhancement. I thought I should share, because I have not seen anyone write up how to dedupe > enhance > redupe while preserving frame rate and retain audio sync at the command line. This is not for the feint-heated and requires a bit of FFmpeg & Topaz command line fu. But, it does demonstrate that decimate > enhance > duplicate can be done to preserve the original frame rate, while significantly reducing the number of frames that are processed or upscaled.

The decimate > enhance > duplicate seems like an interesting technique that could be valuable to others, perhaps for anime or cartoons which are “animated on twos or threes” or any content containing duplicate or large numbers of static shots.

Yup, it uses the command line. Sorry. ChatGPT is your friend.

Background
I have some 480i digitizations of broadcast masters of real-life crime documentaries that have the following characteristics:

  1. The digitizations contain a huge amount of temporal noise (especially in the chroma plane), since they originated from an analog source
  2. The documentaries contain quite a high number of static shots (title cards, slates & photos), as was the style at the time. Static shots become duplicate frames.
  3. The documentaries contain a mix of interlaced content and hard-telecined content. The presenter’s segments are at 59.94, and the reconstructions are at 23.98. At first, I was uuuurgggh, but speaking to a producer of the era, I learned it was an intentional aesthetic/technique to show the reconstructions or historic segments at slightly-jerky film-look 23.98 hard-telecined. A CRT television would have displayed some duplicate fields, giving the slightly jerky motion, signaling to the audience that it was “the reconstruction part”, much like the rather hackneyed language of flashbacks uses color-grading to give that ‘sepia cast’. Anyway, the hard-telecine is an important part of the producer and editor’s aesthetic. So, the jerkiness and the duplicate frames stay.

However, the problem with duplicate frames is twofold:

  1. Temporal noise between the duplicate frames is particularly distracting (and manifests as shimmer). I’m focusing on denoise+decimate here, but you don’t have to denoise to use this technique.
  2. The processing of these duplicate frames also wastes a lot of resources and takes time.

So, I wondered how to use FFmpeg’s mpdecimate to delete duplicates (thus converting to variable frame rate), enhancing only the non-duplicates (thus saving a bunch of time in the process) and then normalize the framerate back to the original 59.94fps, while somehow preserving the timeline. Then slap a bit’o’noise back on again at the end to stop it from looking stalled.

First, I’m going get all my deinterlacing out of the way. My source is 8-bit, limited range. My source is constant frame rate. The bwdif deinterlacer is selective, which means it will line-double the macroblocks where it detects combing, but it will weave the macroblocks where there is no edge-combing. After deinterlace, I should now have 59.94fps, progressive, constant frame rate. I’ll save as lossless FFV1, in a NUT container. In theory, no frames or fields should have been harmed in the making of this video, other than the dotcrawl+deinterlace and maintaining broadcast levels. I now have progressive, albeit noisy content, containing duplicate frames.

$ ffmpeg -i "${infile}" \
  -filter:v "format=yuv422p, dedot=m='dotcrawl-rainbows':lt='(15/510)':tl='(31/255)':tc='(0/255)':ct='(255/255)', bwdif=mode='send_field':parity='auto':deint='all', limiter=planes=1:min=16:max=235, limiter=planes=6:min=16:max=239, setparams=range='tv'" \
  -codec:v 'ffv1' -level:v 3 -g:v 1 \
  -codec:a 'copy' \
  -t 120 "./deinterlaced.nut" -y

Now, I want to decimate / dedupe. But… it is impossible to decimate when there is significant temporal noise between adjacent frames; the high level of noise makes every frame unique. But, if you denoise the frames so much as to allow mpdecimate to do it’s thing, you will have destroyed the source beyond the boundaries of acceptable quantity. It is a quandary.

There has to be a way… I need a timeline.

Well, buried in the bottom of a stack overflow is a little gem from Gyan, who is one of the FFmpeg developers. “the overlay filter syncs with the first input, so the full clone is only seen when a frame [also] exists for the base input” https://stackoverflow.com/a/62659668. That snippet is kinda golddust. Thanks, Gyan!

It means we can use a really aggressive temporal denoise (adjust to taste), followed by mpdecimate (again, adjust to taste) to drop the duplicates… For my use-case, denoise+mpdecimate was far more predictable than tweaking mpdecimate’s values. I’m sure a knob-fiddler will be able to optimize these values for their particular content.

hqdn3d=luma_spatial=4:chroma_spatial=4:luma_tmp=6:chroma_tmp=10, mpdecimate=max=0:keep=0:hi=768:lo=320:frac='1/3'

… but the neat trick is that we’ll only need/use that denoised-and-decimated version as a “timeline”. We use the overlay filter to drop the clean, non-denoised version on to the decimated timeline.

[0:v:0]split=outputs=2[split1][split2]; \
  [split1]hqdn3d=luma_spatial=4:chroma_spatial=4:luma_tmp=6:chroma_tmp=10, \
  mpdecimate=max=0:keep=0:hi=768:lo=320:frac='1/3'[split1]; \
  [split1][split2]overlay[out]"

Now, we can pipe that decimated, variable frame-rate into Topaz’s FFmpeg. FFmpeg’s models can do their enhancement, denoise, super-resolution etc on the deduped frames. Choose your model. TVAI will enhance only the non-duplicate frames (which saves a bunch of time).

-filter:v "tvai_up=model='iris-2'

Then, we restore our original constant frame rate with:

fps=fps='(60000/1001)'

convert back to YUV pixel format

format=pix_fmts='yuv422p'

add finally a minor amount of artificial temporal noise to stop static scenes from looking stalled. Again, adjust to taste; YUV luma temporal noise (c0) is considered less distracting than chroma (c1, c2) temporal noise.

noise=c0_strength=3:c1_strength=1:c2_strength=1:all_flags='t'

and finally we mux our audio back in with a secondary input into Topaz’s FFmpeg.

Bringing it all together, piping standard, vanilla FFmpeg into Topaz’s FFmpeg.

$ infile="./deinterlaced.nut"
$ ffmpeg -hide_banner -loglevel 'warning' -report \
  -i "${infile}" \
  -filter_complex "[0:v:0]split=outputs=2[split1][split2]; [split1]hqdn3d=luma_spatial=4:chroma_spatial=4:luma_tmp=6:chroma_tmp=10, mpdecimate=max=0:keep=0:hi=768:lo=320:frac='1/3'[split1]; [split1][split2]overlay[out]" \
  -map '[out]' -codec:v 'rawvideo' -f 'fifo' -fifo_format 'nut' -queue_size 20 "pipe:1" \
  | ffmpeg-topaz -hide_banner -report \
    -i "pipe:0" \
    -i "${infile}" \
    -map '0:v:0' -filter:v "tvai_up=model='iris-2', fps=fps='(60000/1001)', format=pix_fmts='yuv422p', noise=c0_strength=3:c1_strength=1:c2_strength=1:all_flags='t', limiter=planes=1:min=16:max=235, limiter=planes=6:min=16:max=239, setparams=range='tv'" \
    -codec:v 'ffv1' -level:v 3 -g:v 1 \
    -map '1:a:0' -codec:a 'copy' \
    "./test.topaz.nut" -y

I thought it would be worth sharing, in case anyone wants to improve further on this decimate and glue-to-timeline technique.

It is not perfect, since we are knowingly destroying and recreating frames. That’s kinda the point.

This post proposes a dedupe > enhance > redupe technique that could be further enhanced by Topaz, with or without denoise.

Dialing in the values for denoise, decimate, fps and noise thresholds are going to depend on the content. This post was to demonstrate the technique, not suggest values. Throwing FFmpeg into debug mode allows you to parse logs to see what mpdecimate is doing and which frames it is dropping.

This is not going to work for all content, but it is a useful technique for:

  • Speeding up TVAI’s processing of content that contains large numbers of static or duplicate frames
  • Content that contains large amounts of temporal noise and duplicate frames, so as to reduce shimmer.
3 Likes