I find that for making a nice text prompt from a source image, Google Gemini seems to be better. The description is neither too long nor too short. Just right. And it works well. I also use the combination of Recover V2 and Low Resolution without Scaling if the image is already between 2k and 4K initial resolution. Then, for the final scaling, I use the combination of (either Low Resolution v2 or Hight Fidelity), Recover V1 and V2. I assemble the best parts of each variant in Photoshop.
A small example of how to describe a photo using Google Gemini for this image:
Precise description of the photo :
A budgerigar is perched on the right fork of a Y-shaped branch. The bird is mostly greenish-yellow, with the typical black stripes on its head and back. Its head is tilted slightly upwards and turned to the right, giving the impression it is looking at something out of frame or is attentive to a sound. The cere above its beak appears brownish or pale pink, suggesting a female or a young male. Its tail feathers are long and dark, with yellow and blue accents near the body. It firmly grips the branch with its pink feet. The branch has rough bark, greyish and brown. The background is a blur of greenery (bokeh) behind a fine-meshed wire netting or net, also blurred. A cable or bar crosses horizontally near the top of the background.
Variant without the animal:
The image displays a Y-shaped tree branch with rough bark texture mixing grey and brown. The branch stands out against a blurred background composed of fine-meshed wire netting or net, behind which lies an indistinct mass of green vegetation. A dark horizontal line (cable or bar) crosses the upper part of the background.
Well, the photo has already been processed in its current resolution to redefine the details. I’m currently scaling to 8160x6120, which is the resolution of my photo sensor.
