These two features I’m about to mention, I’ve learned to include in my workflow outside of Gigapixel AI. The only problem, I pretty much have to do them, to get the best results out of Gigapixel AI. But they logically belong in the product itself.
These two things have the most dramatic effect on the quality of the end result - far and above the two main sliders we have to play with, “Supress Noise”, and “Remove Blur”.
(In fact, “Remove Blur” shouldn’t even be there in the first place. While it does have barely noticeable impact, it’s too small to bother with, and you are obviously striving for the cleanest, simplest UI possible, which I love. If I managed this product, that would be the very first thing to go. But I digress.)
Here are the two features:
- Pre-downsample (aka “pre-downscale”, used interchangeably here)
- UI: A basic slider, ranging from 0 to 100%.
- Result: Downscales the input image by that amount, before any other processing.
Justification: While it sounds completely unintuitive, in my fairly exhaustive testing, downscaling images before processing (without sharpening - which is not the Photoshop default), has proven to have the single most dramatic effect on the output quality.
Ever-increasing megapixel counts are producing decreasing marginal returns on “useful information per pixel”. While the total amount of useful information still increases with higher megapixels, the “useful information density” decreases. (Arguably it’s even possible to pass some threshold where total useful information also decreases with increasing sensor density, due to the constraints of physics and photons. “We” seem to be bumping up against that already, to the point where the only way to continue increasing useful megapixels, is with AI assist built into the hardware.)
Gigapixel AI clearly has a harder time with this kind of “low information per pixel” content. The softer the input when considered at 1:1 zoom, the worse Gigapixel AI’s final result. In more extreme but still quite realistic scenarios, the product is completely unable to enhance the image, and worse yet, is more likely to hallucinate weird artifacts that make no sense in context.
The reason we get decreasing marginal returns on information density, with increasing megapixels, are due to things like:
- Atmospheric turbulence and haze
- Lens softness at various settings; no lens is immune - while great lenses can be significantly sharper than cheap ones, their limits are still explored at increasing megapixels - that’s just math and physics when comparing any lens to itself at ever-increasing megapixel count.
- Demozaic/antialias filtering (which needs to be done even if high-end cameras don’t have them)
- De-bayer filtering (or whatever the sensor RGB arrangement)
- Shutter-induced vibration
- Motion blur whether subject and/or camera.
- Increased noise and binary precision error (and subsequent filtering to control it) due to not enough photons hitting ever-shrinking pixel sites
Unfortunately there’s no hard and fast rule of how much to downsample first. Sometime 75% might be best for high-quality input. For low-quality input - especially if you don’t have the source material and only a high-compression jpg that’s oversharpened, 33% might - incredibly - be best. After dozens of attempts you can start to get a “feel” for a good starting point, but for best results often requires multiple tries at different downsample ratios.
Obviously doing this manually is incredibly tedious. But is still, according to my testing, the single most significant bang-for-buck effort.
If this could be done in the program itself, where it makes the most sense and would significantly simply overall workflow, that would be a huge win for users.
Ideally that could be one of the “filters” that the AI engine has at its disposal and is trained on, but for now this would have the most dramatically noticeable difference on final results.
- UI: An integer text box.
- Result: Changes the number of iterations the program takes, to get to the users’ desired final effective scaling (calculated based on original input before downscaling). Whether that’s a specified final pixel dimension or a scaling factor, you can determine how big each iteration should be, as a scaling factor that’s the same for each iteration. (It’s not simple subtraction or division though, but not difficult to calcluate given [original input px], [final output px], and [desired iteration count].
With Gigapixel AI, a quality 2k image can usually be scaled up to 8k with impressive results. But the second most significant factor impacting the end result (after pre-downscaling) is, according to my testing, the number of iterations. In other words, don’t jump from a 2k image, to 8k all at once. Instead, do it in 2 or 3 iterations. (Obviously this currently requires multiple saves and re-running of Gigapixel AI.) The end result is usually vastly superior - in terms of “real” (not USM) sharpness, and less hallucinated color noise & odd color fringing that has to be post-processed away.
Again, this would ideally be just another “filter” the neural net has to work with and has been trained on. But barring that, giving the user direct control over it would be a godsend.
Some downsides to both of these might be:
- A slight increase in UI complexity (but only increasing the control count by 1 - assuming you get rid of the comparatively less useful “focus” slider). As a company, you seem to be trying to attract new users through a) UI simplicity, and b) sophisticated marketing blitz. These features are more focused on rewarding slightly more serious users with significantly better output. But I get it, even with no adjustments, this product sells itself, so why make that harder? Answer: Because you want to reward your more serious users as well, and make it easy for the product to product even more astonishing results. It is possible to do both - or at least, strike a much more optimal balance than current.
- Increased odds of user confusion over what the input levers do.
- Increased odds of user confusion since downscaling is the opposite of what the product is intended for. (It could be called something else, such as simply “pre-scaling”, and scaled in reverse from 100 to 1%. Users would quickly figure out that it simply improves the preview results up to some arbitrary point that’s different for every input, then it starts getting worse the further you take it.)
- Increased processing time. Dramatically so, for “Iterations”. Yes, the time per photo would increase. But if users have to do this manually anyway - which I suspect most advanced users eventually would stumble on - then this provides a clear benefit to your users.
- Axiomatically, new users might think the product is “too slow”, if they randomly tweak the “iterations” setting without really understanding what it’s doing. This and other problems could be addressed by having a simple “Advanced” toggle. Turning this on and off hides (by default) or exposes: “Supress Noise”, “Pre-scaling”, and “Iterations”. “Iterations” could have a “!” icon by it, which explains that this setting can significantly increase processing time.