Downscale using Perceptually Based Downscaling of Images

For cine camera users with 12K/8K/6K RAW capturing, it requires a better downscale method for 2160p, or for users also who want to just downscale 2160p movies to 1080p.

The following perceptually downscale method is very interesting and seems to be the most advanced:

https://www.cl.cam.ac.uk/~aco41/

It’s not available to any other software.

1 Like

Different Dowsncalers so far have not been a topic, I like that you introduce this important, underrated topic :slight_smile:

At the moment, lancoz is used internally, which is fine for most stuff… Personaly, I wouldn´t mind having the possibility to chose from different ones, if some perceptual based one is included - even better…

While here it´s “new”, this topic in general of course has been adressed in detail elsewhere, so for thoes interested or some brainstorming and inspiration:

https://www.gcc.tu-darmstadt.de/home/proj/dpid/

Of course, there is a lot more out there, these just came to mind…

I’m getting better picture quality by using input-output of 6K TIFFs (16bit) with Topaz and then downscale them with catmull-rom to 4K using Resolve.

I don’t like the results of Topaz 6K->4K TIFF output.

Wow too weird. Independently, I was envisioning a downscaler like DPID just a few weeks ago. I spent several days thinking about how it should work which is what the paper describes. Then I searched for a few days but could find nothing. I didn’t think to use the search term “perceptual”. I was using terms like “smart resizer” and such.

As most of my VEAI projects are downscaled (often 2x) from a 2x VEAI upscale, I should see more detail. Damn. If this works I may have to redo some of my projects.

Question to the devs:
Does a 400% model add more detail than the 200%? I did some tests early on that the final downscaled version using Spline64Resize looked the same with either model. However, if 400% does add more detail, with DPID it could make even more of a difference.

I’m doing a test with the movie I’ve been working on recently. It’s 768x576p, poor lighting and overcompressed, Bits/(Pixel*Frame)=0.070 at 23.976fps with H264. Artemis Dehalo with json mod to turn extra sharpening off, makes wrinkles occasionally look like veins. Plus if you look closely and know what to look for you can see it’s had processing done (an artifact of the Dehalo model on this vid). At 834x626p it looks good, but I want 720p.

A lousy 1.25 scale factor yet it looks poor 1.25 times bigger with any resizer. The best results were with Artemis Dehalo at 200% downscaled to 720p. It would be ok except for the vein issue… which is being caused by the Artemis Dehalo model (the detail enhancement is too strong). Artemis MQ and HQ at 200% don’t have the vein issue but are too blurry.

My theory is that if I can keep the detail of a 200% upscale using the blurry Artemis then downscale to 720p with DPID, I’ll get the detail I want without the veins. Proteus is no good with this video. I spent a day tweaking its settings before I decided the ML model just can’t work with this movie. It has to be Artemis HQ, LQ, or Artemis Dehalo.

I began the test with a 400% upscale using Artemis HQ v9, and it does appear there is more detail as slight veining was seen early on. I aborted and restarted with 200% and it looks good. It’s going to run all night and tomorrow. Then I’ll make a MP4 with DPID downscaling and have my answer.

I have checked virtually everything with 6K TIFF 16bit import and 6K/4K/8K/12K export. (using Proteus)

The best solution so far from my tests is the 6K-6K input-output, and then downscaling externally with catmull-rom, it will provide the best looking / more natural picture.

Any downscaling using Topaz produces a more ‘digital’ look with more anti-aliasing issues.

The results are that the regular Artemis is inferior to Artemis Dehalo when it comes to sharpening this standard definition movie. I even tried a short test at 400% vs. 200% and saw no difference using DPID or any resizer.

The only time I could see a different was with DPID and using lambas at 2.0 or higher. Lamba is not well documented but higher values seem to increase the effect of the filter. The end result is about the same as applying a sharpening filter. There is a huge speed penalty using DPID with any higher lamba than the default of 1.0. For some reason, DPID with higher lambas than 1.0 lowers CPU utilization. Multithreading (using Avisynth+) corrects the issue and FPS is about the same as not using DPID at all.

The bottom line is, you probably will see no benefit to using DPID with SD video. HD video may be different.

@connecteddd
I assume you are referring to catmull-rom via the bicubic resizer in Avisynth. I tried it but could see no difference.

It’s the Catmull-rom option @ scaling Settings of DaVinci Resolve (16-bit 6K TIFF input → 16-bit 4K TIFF output).

Interpolation Methods (2)

The images are all from upscaling.


Downscaling are luxury problems. :wink:

With downscaling, there are problems with the fact that if you use large apertures (small numbers), blurred areas may become sharp during downscaling.

I like to use bicubic for photos, because it doesn’t distort the details and because you can sharpen it well (with selective choice to sharp and not sharp areas).

The photos were some quick examples from the web.

From my testing with downscale using Topaz, other image software, Scratch, Resolve, I found that Catmull-rom provides the best downscale when your native resolution image is already very detailed.

Topaz is not good at that area; it works better when you have the same resolution for input-output; I prefer the downscale to be performed externally.

Maybe Topaz processing starts with a downscale and then with the recovery of details/revert of compression.

Downscaling - like Upscaling is best chosen indivdually on the footage and the desired output. There is no “best” scaler. So VEAI either includes a very big set of different input/output scalers - or goes for the best compsomise (or a smaller set of options)…

Related to this topic is an issue mentioned a lot of times elsewhere:

The order of which prescaling/processing - inference (AI treatmend) - output scaling … is done…

There are many cases where the combination of the above matters and influences the outcome a lot. And at the end, its all a matter of personal taste.

Example: Having a very blurry footage from a TV_Sat recording that was originally recorded on a CCTV Camera at a TV-station, than upscaled by whatever the person at the TV-station had at hand, then squeezed to a 2:1 ratio for SAT-Deployment… (VERY common case for “older TV Shows” still boradcasted today over “non 1080p SAT”.

In this case, the Original resolution is far lower than what is broadcasted, the “optical resolution” is lower than the amount of pixels present. To cope with that, the image often is sharpened to counter-effect the blurrieness before encoding prior to broadcasting.

Here, a process qeue like this one is often best:

  1. scale the image back down to the original optical resolution
  2. try to get rid of the oversharpening
  3. the run it through VEAI
  4. then correct for the 2:1 PAR

At the moment, Step 1 is not possible in VEAI and step 4 is internaly set before step 3. Getting rid of over sharpening can be adressed via the de-halo artemis models in some cases, in others external filtering is needed.

lets leave out the sharpening issue and not even get started on de-interlacing., This still leaves the lack of indivudally scale/resize pre- and post inside VEAI. So the result will be a picture with heavily stretched pixels horizontally and a FAR less optimal result compared to having access to the order of processing…

Lomg story short - this is only one example - but we could take a look at hundreds more… The more complex the possibilities to tamper with the footage get - the less “beginner friendly” the software will get. It always will come down to some conpromise.

I´d personaly take another route n development: Make a sketch of what the software is going to be capable of, which user group will be adressed and which “type of user experience” the software is aiming for - THEN implement these features as planed on some road map.

At the moment, the development of VEAI takes another approach: Build whats possible - aks the community about wishes and then go from there, depending on what can be done in a meaningfull time, producing a short release time to “keep it interesting”…

IMHO, a good compromise would be to have two sections inside VEAI:

prefilter

and postfiltering

and throw a handfull of options in each of them to deal with resizing, cropping, sime basic colour/hue/brightness filters and maybe a denoise/etc… option…

This should keep it simple enough and offer a lot more individual processing, while still being “easy” to develop and implement…

The above mentioned downscaler “perceptual based downscaling” could be one of the downscale options.

IMO the downscale should be done before VEAI processing and not after.
This is particularly true on some fake full HD 1080p videos in the early ages of HD that were shoot with camera lenses that were in fact just 720p and then a simple upsize was performed to make it 1080p.

For this kind of videos a downscale to half (540p) is necessary and you will have by far better results running a 200% upscale than a simple 100% if you want to have a 1080p restored video.
Or a 400% for 4K will have even better results than a 720p downscaled and then apply a 300% that will just be a 2880p forced again to be downscaled to 2160p

And of course the pre-downscale method is also important as it can dramatically change the overall result.

So it would be good for VEAI to have a pre-downscaled build-it functionality with of course the maximum of methods algorithms possible and not just throw a no choice Lanczos method.

When you have 6K RAW TIFF from the cine camera, it already has all the details; the downscale at this case will need to be performed after the processing.

A drop-down menu with algorithms and the capability to select if you need before or after processing will be great for all users.