That said, I am surprised Seedream 4.0 beat it in these tests.
Google is so weirdly non-integrated.
https://blog.google/technology/ai/nano-banana-google-product...
> Google is so weirdly non-integrated.
Where by try gemini non- integrated have you tried gemini you mean gemini is here they shove use gemini gemini into every single product they have?
OP here. While Seedream did have the edge in adherence it also tends to introduce slight (but noticeable) color gradation changes. It's not a huge deal for me, but it might be for other people depending on their goals in which case NanoBanana would be the better choice.
these aren't cases where I'm trying to do something that skirts the edge of copyright, either (like "Ghiblifying" images, for example).
that said, when it does work, it is super impressive.
Copyright: Zero guardrails on anything related to third-party IP, which lets you do some funny things. (I'm including a picture/prompt of Super Mario, Mickey Mouse, and Bugs Bunny partying at a nightclub in the blog post)
Moderation: It has far fewer guardrails and any other Google AI product I've tried, and it is possible to prompt engineer some images that would definitely be considered NSFW by most people — more NSFW than actual NSFW image generators (a post-generation filter will catch most nudity, however). I have not had any rejections for more innocous queries that could be misinterpreted as being NSFW.
Seedream 4.0 is somewhat slept on for being 4k at the same cost as nano-banana. It's not as great at perfect 1:1 edits, but it's aesthetics are much better and it's significantly more reliable in production for me.
Models with LLM backbones/omni-modal models are not rare anymore, even Qwen Image Edit is out there for open-weights.
I've been using Nano Banana quite a lot, and I know that it absolutely struggles at exterior architecture and landscaping. Getting it to add or remove things like curbs, walkways, gutters, etc, or to ask to match colors is almost futile.
I think this was fairly predictable, but as engineering improvements keep happening and the prompt adherence rate tightens up we're enjoying a wild era of unleashed creativity.
E.g. Gemini 2.5 Flash is given extreme leeway with how much it edits the image and changes the style in "Girl with Pearl Earring" only to have OpenAI gpt-image-1 do a (comparatively) much better job yet still be declared failed after 8 attempts, while having been given fewer attempts than Seedream 4 (passed) and less than half the attempts of OmniGen2 (which still looks way farther off in comparison).
Seedream 4 won on points, but Gemini seems more steerable and required less fighting on many of the tasks
Still, to my eye, ai generated images still feel a bit off when doing with real world photographs.
George's hair, for example, looks over the top, or brushed on.
The tree added to the sleeping person on the ground photo... the tree looks plastic or too homogenized.
It's mostly because image model size and required compute for both training and inference have grown faster than self-hosted compute capability for hobbyists. Sure, you can run Flux Kontext locally, but if you have to use a heavily quantized model and wait forever for the generation to actually run, the economics are harder to justify. That's not counting the "you can generate images from ChatGPT for free" factor.
> George's hair, for example, looks over the top, or brushed on.
IMO, the judge was being too generous with the passes for that test. The only one that really passes is Gemini 2.5 Flash Image:
Flux Kontext: In addition to the hair looking too slick, it does not match the VHS-esque color grading of the image.
Qwen-Image-Edit: The hair is too slick and the sharpness/saturation of the face unnecessarily increases.
Seedream 4: Color grading of the entire image changes, which is the case with most of the Seedream 4 edits shown in this post, and why I don't like it.
The economics 1000% do not justify me owning a GPU to do this. I just happen to own one.
If you take a base model and train it on a hundred Seinfeld frames, it would pick up the specific style - the color grading, grain, lighting - and it would add the hair way more naturally
If I were to make an image editing app, this would be the model I'd choose.
My usecase: An image of a cartoon character, holding an object and looking at it. Wanted to edit so that the character no longer has the object in her hand and now looking towards the camera.
Result Nanobanana: At first pass it only removed the object that the character was holding, however there was no change in her eyeline, she was still looking down at her now empty hand. Second prompt explicitly asked to change the eyeline to look at camera. Unsuccessful. Third attempt asked the character to look towards ceiling. Success but unusable edit as I wanted the character to look at the camera.
Result Reve: At first attempt it gave me 4 options and all 4 are usable. It not only removed the object and changed the eyeline of the character to look at the camera, but it also made posture changes so that the empty hands were appropriately positioned, and now since the character is in a different situation (sans the object that was holding her attention) Reve posed the character in different ways which were very appropriate - which I didn't think of prompting for earlier (maybe because my focus was on immediate need - object removal and change in eyeline).
On a little more digging found this writeup which will make me to signup for their product.
Even so, Gemini would lose by 1, but I found that I would often choose it as the winner(especially say, The Wave surfer). Would love to see a x/10 instead of pass/fail.
Prompt: "Keeping the glass and the hand behind the glass the same, please change only the three brown candies in the glass into green, yellow, red, and orange candies. Make no other changes. Change the reflection to remove the brown candy too." Seed was 1070229954903864, but your setup is probably too different for that to help.
It seems like Gemini 2.5 Flash was the only model that successfully removed the reflections...it should get some points for that!
Some might critique the prompts and say this or that would have done better, but they were the kind of prompt your dad would type in not knowing how to push the right buttons.
I feel like the FAQ section isn't displayed prominently enough:
How are the prompts written?
  In addition to giving models several attempts to generate an image, we also write several variations of the prompt to ensure that models don't get stuck on certain keywords or phrases depending on their training data. For example, while hippity hop is a relatively common name for the ball riding toy, it is also known as a space hopper. We try to use both terms in the prompts to ensure that models are not biased towards one or the other.
  Prompts for Hunyuan were attempted in both Chinese and English with and without Image Optimization.
Still useful comments, as the models mostly overlap
If you've already got a decent GPU (or were going to get one anyways) then cost isn't really a consideration, it's just that you can already do it. For everyone else, you can probably get by just using things like Google's AI Studio for free.
Clearly you’re looking at the task through the eyes of a hobbyist or “of the month” project so the workflow and pace may not be obvious but API budgets spend fast. Just look at the benchmarks in this article to see how many tried some of these changes took- 47, there goes $3 in 3 minutes, or half that time if your quick on the keyboard.
And even then! Well, you’re limited aren’t you? Limited to the Gemini model, or OpenAI, or whoever, and you see the limits of any one model in the article as well. Or you plonk down for a mediocre GPU with some slight VRAM headroom and choose from dozens of models, countless Lora, control nets, and other options, infinitely flexible in painting and outpainting. Ahead of that you’ll need to budget at least a dozen hours to learn local genai tools, comfyui or others. Then, for under a $1 dollar in electricity, you can can queue up a dozen ideas overnight and get 1,000 variations on each of them handed to you in the morning to quickly triage over coffee and email catchup.
It’s not a one size fits all market though, and most professionals are likely finding they want both: A low-cost, high-control, high precision sandbox that isn’t as fast or scalable as the api, and the api for when fast and scalable is what you need.
Sure, but now you get a good gaming GPU that you can write off as a business expense.