Problem definition
Due to how the human visual system works, the optimal way to compare and contrast images is to transition immediately from one image to another, replacing the pixels in the same position for the former with the latter. The more direct this transition is, the easier it is for the brain to perceive the differences.
Currently, there is no feature in Gigapixel that offers a non-visually interrupted transition of the pixels in one rendition to another, which leads to subpar ability to contrast and compare images.
What features does Gigapixel already have related to this domain problem?
-
The ability to click different “AI models” on the right-hand side-bar of the main UI, to trigger inference using the chosen model followed by a rendition of its output (prediction).
-
A divider feature that allows the user to compare the chosen model output with the source image by dragging a vertical line across the image preview pane.
-
A side-by-side tiled view with the source image on the left and the chosen model rendition on the right.
Why are the existing features not sufficient?
-
#1 has two problems. First it forces the user to move their eyes from the preview image over to the model selection pane in order to find the next model to preview output for. This eye movement corrupts the visual memory in the brain, causing the brain to lose most information memorized about the previous image for which the contrasting was intended to be performed against. Then the user is forced once more to move their eyes, back to the preview and wait for the preview to render, further corrupting the memory of the prior image. Sometimes the preview completes rendering before the eyes have time to refocus on the image preview area, sometimes not (takes a longer time to do inference). Regardless, when the eyes have refocused, the user’s brain has already forgotten most image detail in the previous image, and is consequently unable to properly contrast the previous image with the new rendition being viewed.
The core problem is that of having to move the eyes away from the image preview area, and the secondary is that there is a rendering delay while the model performs inference, making it impossible to perform quick A/B alternations that is the foundational mechanism for visual A/B assessments, and incorporated in other similar tools (like Video Comparer or MSU VQMT)
-
#2 has two problems. First it is only able to compare the currently selected model output to the source image, not against other model outputs. This makes the feature unusable for comparing different model outputs. Second, it disrupts the visual field with the divider and mouse cursor, partially corrupting the user’s memorized representation of both the source image and the target image (model output)
-
#3 also has two problems getting in the way of the objective. First it shares the problem with #2 in that it can’t be used to contrast different model outputs, only the currently selected model’s output against the image source. Secondly it presents the rendition of each image in a separate region on the screen which forces the user to move their eyes, which as already described in #1 results in visual memory corruption. The end result is that this form is the worst of the three in terms of performing visual image comparison and contrasting.
How might a solution to the problem look like?
Simply enhancing the existing comparison option #1 with:
-
Keyboard shortcuts, where a given key when pressed triggers the equivalent action as clicking a model in the “AI models” selection pane. This way the user never has to move their eyes away from the image preview region, and will consequently retain complete visual detail in memory. As a result, when the image pixels are changed to a given model’s output (or back to the source again) the pixel changes are being interpreted by the brain as movement, which in turn tells the user precisely where and what changes occur, as well as at what magnitude. This is exactly what a user wants to know when comparing and contrasting model outputs against each other or against the source image.
-
Cache model output renditions in order to reduce latency when toggling and re-toggling different model outputs (by avoiding redundant re-computation of already calculated model outputs).
This is an optional, but welcome improvement that could be released as a separate enhancement, but would make the preview experience significantly better. Being able to quickly tap the shortcuts for the different model outputs one after the other, or in different orders would make for a very natural, seamless and optimal contrasting experience. At most 7 images would need caching (number of models + 1 for the source image), which means it’d not be expensive in terms of memory footprint required, especially when considering how much memory the Gigapixel app as a whole already consumes.
Assuming the keyboard bindings were something simple and intuitive, like the keys {key_0: source-image, key_1: standard, key_2: high-fidelity, ...}
, then the user workflow when doing A/B testing would simply be to press the corresponding keys one after another while keeping their eyes fixed on the image preview region.
Note: Having the keys be easy to reach for and press without having to take ones eyes from the app’s preview region is a must, as any complex combination (ctrl+alt+\ …) would force the user to take their eyes off the preview region, move the eyes over to their keyboard, in order to locate the keys to press, which would nullify the entire improvement by introducing the same core problem that the current option #1 already suffers from.
Non-controversial
The proposed solution would seem non-controversial given that it seems to be what the Gigapixel team members internally prefer and use themselves, evidenced by it being the chosen option to clearly convey rendition differences in a video produced by the product team. As such this feature request (idea) can be reformulated to “Please surface your own preferred way of comparing and contrasting images through the product user interface for end-users to also enjoy”.