Two Gigapixel AI features to radically improve results

topaz-9221 · June 13, 2020, 7:25pm

These two features I’m about to mention, I’ve learned to include in my workflow outside of Gigapixel AI. The only problem, I pretty much have to do them, to get the best results out of Gigapixel AI. But they logically belong in the product itself.

These two things have the most dramatic effect on the quality of the end result - far and above the two main sliders we have to play with, “Supress Noise”, and “Remove Blur”.

(In fact, “Remove Blur” shouldn’t even be there in the first place. While it does have barely noticeable impact, it’s too small to bother with, and you are obviously striving for the cleanest, simplest UI possible, which I love. If I managed this product, that would be the very first thing to go. But I digress.)

Here are the two features:

Pre-downsample (aka “pre-downscale”, used interchangeably here)
Iterations

Pre-downsample

UI: A basic slider, ranging from 0 to 100%.
Result: Downscales the input image by that amount, before any other processing.

Justification: While it sounds completely unintuitive, in my fairly exhaustive testing, downscaling images before processing (without sharpening - which is not the Photoshop default), has proven to have the single most dramatic effect on the output quality.

Ever-increasing megapixel counts are producing decreasing marginal returns on “useful information per pixel”. While the total amount of useful information still increases with higher megapixels, the “useful information density” decreases. (Arguably it’s even possible to pass some threshold where total useful information also decreases with increasing sensor density, due to the constraints of physics and photons. “We” seem to be bumping up against that already, to the point where the only way to continue increasing useful megapixels, is with AI assist built into the hardware.)

Gigapixel AI clearly has a harder time with this kind of “low information per pixel” content. The softer the input when considered at 1:1 zoom, the worse Gigapixel AI’s final result. In more extreme but still quite realistic scenarios, the product is completely unable to enhance the image, and worse yet, is more likely to hallucinate weird artifacts that make no sense in context.

The reason we get decreasing marginal returns on information density, with increasing megapixels, are due to things like:

Atmospheric turbulence and haze
Lens softness at various settings; no lens is immune - while great lenses can be significantly sharper than cheap ones, their limits are still explored at increasing megapixels - that’s just math and physics when comparing any lens to itself at ever-increasing megapixel count.
Demozaic/antialias filtering (which needs to be done even if high-end cameras don’t have them)
De-bayer filtering (or whatever the sensor RGB arrangement)
Shutter-induced vibration
Motion blur whether subject and/or camera.
Increased noise and binary precision error (and subsequent filtering to control it) due to not enough photons hitting ever-shrinking pixel sites

Unfortunately there’s no hard and fast rule of how much to downsample first. Sometime 75% might be best for high-quality input. For low-quality input - especially if you don’t have the source material and only a high-compression jpg that’s oversharpened, 33% might - incredibly - be best. After dozens of attempts you can start to get a “feel” for a good starting point, but for best results often requires multiple tries at different downsample ratios.

Obviously doing this manually is incredibly tedious. But is still, according to my testing, the single most significant bang-for-buck effort.

If this could be done in the program itself, where it makes the most sense and would significantly simply overall workflow, that would be a huge win for users.

Ideally that could be one of the “filters” that the AI engine has at its disposal and is trained on, but for now this would have the most dramatically noticeable difference on final results.

Iterations

UI: An integer text box.
Result: Changes the number of iterations the program takes, to get to the users’ desired final effective scaling (calculated based on original input before downscaling). Whether that’s a specified final pixel dimension or a scaling factor, you can determine how big each iteration should be, as a scaling factor that’s the same for each iteration. (It’s not simple subtraction or division though, but not difficult to calcluate given [original input px], [final output px], and [desired iteration count].

Justification:

With Gigapixel AI, a quality 2k image can usually be scaled up to 8k with impressive results. But the second most significant factor impacting the end result (after pre-downscaling) is, according to my testing, the number of iterations. In other words, don’t jump from a 2k image, to 8k all at once. Instead, do it in 2 or 3 iterations. (Obviously this currently requires multiple saves and re-running of Gigapixel AI.) The end result is usually vastly superior - in terms of “real” (not USM) sharpness, and less hallucinated color noise & odd color fringing that has to be post-processed away.

Again, this would ideally be just another “filter” the neural net has to work with and has been trained on. But barring that, giving the user direct control over it would be a godsend.

Conclusion

Some downsides to both of these might be:

A slight increase in UI complexity (but only increasing the control count by 1 - assuming you get rid of the comparatively less useful “focus” slider). As a company, you seem to be trying to attract new users through a) UI simplicity, and b) sophisticated marketing blitz. These features are more focused on rewarding slightly more serious users with significantly better output. But I get it, even with no adjustments, this product sells itself, so why make that harder? Answer: Because you want to reward your more serious users as well, and make it easy for the product to product even more astonishing results. It is possible to do both - or at least, strike a much more optimal balance than current.
Increased odds of user confusion over what the input levers do.
Increased odds of user confusion since downscaling is the opposite of what the product is intended for. (It could be called something else, such as simply “pre-scaling”, and scaled in reverse from 100 to 1%. Users would quickly figure out that it simply improves the preview results up to some arbitrary point that’s different for every input, then it starts getting worse the further you take it.)
Increased processing time. Dramatically so, for “Iterations”. Yes, the time per photo would increase. But if users have to do this manually anyway - which I suspect most advanced users eventually would stumble on - then this provides a clear benefit to your users.
- Axiomatically, new users might think the product is “too slow”, if they randomly tweak the “iterations” setting without really understanding what it’s doing. This and other problems could be addressed by having a simple “Advanced” toggle. Turning this on and off hides (by default) or exposes: “Supress Noise”, “Pre-scaling”, and “Iterations”. “Iterations” could have a “!” icon by it, which explains that this setting can significantly increase processing time.

Lech_Balcerzak · June 14, 2020, 7:36am

Very good analysis. And accurate.

Although a regular user and amateur of photography will rather try to simplify the function, instead of adding more options (because generally humankind is lazy), but the developer could consider your advice and add an extended menu, enabled on request (for specialists).

As I use pre-downsampling in my own models (I’m also a developer of neural networks), but for my own needs (I’m also a graphic artist) - I just combine drawing and programming. Pre-downsampling gives good results for blurred input, noisy images and everything you have written.

Iterative and gradual zooming I haven’t used yet, but I do other things like oversampling zooming, transformations on frequency models (FFT) instead of pixels and more.

In general, there is a “more or less” mastered theory of neural networks, but because these neural networks are non-linear creations - in fact, even the best authors and scientists don’t know exactly how they work. And sometimes no one can predict what the effect of AI or image processing will be - precisely because of their non-linearity.

Regards,
Lech Balcerzak

Artisan-West · June 14, 2020, 5:09pm

I usually find fault with the faces that Gigapixel produces when upscaling a picture where the faces are small. It distorts the eyes and mouth/teeth. However, I tried your suggestion of iterative upscaling on one picture and the results of X2 + X2 verses X4 showed no difference.

I also tried the downscaling trick on the same picture. The longest side was 1434px which I down sized in Affinity Photo to 1000px. I then ran it in Gigapixel AI again with X4 upscale. While the results were not perfect, it was substantially improved on facial features. This may be a great option to have in GP or build it into Face Refinement.

Lech_Balcerzak · June 15, 2020, 11:38am

Methods such as “pre-downsampling” or “iterative, partial enlargement” are good, but do not significantly affect the quality of facial processing - but rather the overall quality of the enlarged image.

The human face is a special structure, and the human eye is very sensitive to even slight distortions in the face. We even have a special center in the brain that is responsible only for that. This is the result of evolution - our life depended on it, recognition of emotions, and so, for example, a place in the herd.

As an artist, I can also say that for the same reason it is the most difficult structure to draw. That is why I do not recommend drawing from the face. If you draw, you’ll see that you can draw nice simple figures first, then landscapes, but animals are more difficult to draw, and the face is very difficult. In other words, when you practice drawing and painting for months - you finally get the skills in drawing trees and flowers, but the face requires much more exercise. Of course, even a face can be created in the end.

Face enlargements are not perfect, they are not even good because of the universal training of models.

Each AI model (I mean a trained neural network) has its information capacity, and at the same time several other factors that affect the quality of processing: the number of possible internal states and the predilection for specific structures in the image / objects dependent on training. There are several other factors, but I will not write about it for now.

The point is that a universally trained network cannot process everything it “sees” well. Like man, he cannot be a master in all areas, although there are people who know a lot (geniuses) or so, we call “renaissance people”.

To get good results, it would be necessary to create an AI equipped with a group of separate neural networks, trained on different objects (abstractions): separately for faces, buildings, grass, flowers, etc.

And then an additional group of networks that would recognize objects, assign them to the appropriate group, and then apply masks to the images (or do it at the level of neurons - a more difficult but better solution).

Finally, you need to add another network: supervisor. The supervisor’s AI would put everything together in one picture. The processed magnification would be of much better quality.

Anyway - this is how the biological nervous system of man and animals works. For example, the human eye does not send to the brain 3 channels with the colors R, G, B, but a dozen or so different channels, including colors, texture, intensity, edges, information about movement, direction of movement and others. The brain is just putting it all together. And what we see - it’s just an illusion and interpretation, not a 1 : 1 mapping. In fact, we’re watching continuous hallucinations instead of real images. That is why, for example, we are susceptible to optical illusions and manipulations.

.
I hope my explanations light up your mind a little.

Best regards,
Lech Balcerzak

topaz-9221 · June 21, 2020, 4:57pm

Thanks for the discussion.

I should point out that although I’ve been processing with some faces, so far most of what I’ve done is landscapes and digital art.

So these suggestions weren’t based so much on working with faces. It can be pretty creepy/Uncanny Valley (or funny) how badly faces get mangled, and I’m starting to get a feel for which photos these steps can improve and which it can’t.

To generalize for faces: low-rez, blurry, or noisey faces seem, unsurprisingly I suppose, more likely to get mangled. I haven’t noticed a significant improvement when enabling the face button - occasionally slightly worse.

Gigapixel AI is seems pretty good at hallucinating natural detail like foliage, clouds, rock, etc. And seems especially good at digital art which already starts out pretty clean. This includes 2d, 3d, and even scans of painted artwork.

I’ve spent quite a bit of time playing around with pre-downscaling, and iterations. For some images, I’ll generate at least four “final” versions to compare, which may mean quite a few more intermediate version. (At least until I get better at predicting which workflow a given image will look best with. And that much, at least, does seem predictable.)

For example, although there may only be a two final comparison images, sometimes I may generate four, such as:

Unaltered original processed directly to “8K” (in quotes because I use a width of 8*2^10 px - as an arbitrary target I’ve settled on for my needs).
Unaltered original, iteratively processed in steps of 2x, with a final <2x step to “8k”.
Original downsampled to 66% or 50% (depending on level of blur, noise, and USM haloing of original - the goal being a starting point that is reasonably sharp and low-noise for its resolution and with no visible sharpening artifacts). Then from there:
- Directly upsampled to “8k”.
- Iteratively upsampled to “8k” as described above.

At first I found it completely unpredictable which would produce the best results, and all four seemed to have an equal chance. But like I said, I’m noticing some patterns in the results, for example:

If the original is blurry, noisey, and/or has unsharp mask haloing (and I can’t find or can’t obtain the original), then downsampling first is pretty much guaranteed to look better when done.
If there is a mix of sharp and blurry (even after pre-downscaling) that you’d like to be all sharp, iterative processing tends to produce better result.
If, for a particular image, hallucinated detail that wasn’t there is less objectionable than blur and/or noise, then iterative processing may be the way to go.
I you want maximum fidelity (e.g. original shapes, minimal false detail, and maximum original fine-grain detail), then don’t do iterative processing.
If a single-pass processing yields too much hallucinated chroma noise, iterative processing will almost certainly smooth it out. (At the possible expense of lost fine-detail, and additional color-fringing where colors transition abruptly.)

Occasionally it happens that the additional hallucinated (and very crisp) detail that results from iterative processing, seem weird and objectionable when viewed at 1:1 zoom - but when viewed as a whole, actually looks really good and a significant improvement over one step processing to the same pixel dimensions.

If you ever tried Google’s image processing AI which was trained on dog faces (I don’t know if that’s still available), you may recall that if you fed it an image (that contained no dogs) iteratively, it would hallucinate increasingly crisp detail each step - free of sharpening artifacts - that looked more and more like a chaotic mess of mutated dog faces. Until that’s all the image was, was dog faces, and you couldn’t tell what it was ever supposed to be.

Sometimes, iterative processing is clearly horrible. Although I’m getting much better at guessing which workflow will work best ahead of time, as a whole it seems about 50% are better with iterative processing. (Again, not counting photos with faces. I don’t have enough fuzzy data points to offer guidance there, other than so far, it seems 1-step upscaling is best - perhaps unsurprisingly - if you want to minimize distortions to faces and “Uncanny Valley” that can sometimes give you the willies. Whether or not you pre-downscale first.)

There are some other guidelines slowly forming that are fuzzy and hard to articulate now, but which seem like they’ll crystalize more testing and experience.

If so I’ll update this if it doesn’t get locked.