Image Generation Compared

A multi-axis comparison of image generation models in early 2026 — quality, style, instruction following, faithfulness to references, multi-image support, pricing, and openness.

Research compiled March 2026 · Artificial Analysis Image Arena + provider docs + community reports

GPT Image 1.5 & Nano Banana 2 tied at top ELO
Midjourney V7 = aesthetic king, no API
Flux Schnell & SD = free self-hosting

ELO Leaderboard

Artificial Analysis Image Arena blind-preference rankings, March 2026. Higher = better.

#ModelOrganizationELOArchitecture
1GPT Image 1.5 (high)OpenAI~1268Autoregressive multimodal
2Nano Banana 2 (Gemini 3.1 Flash)Google~1262Autoregressive multimodal
3Nano Banana Pro (Gemini 3 Pro)Google~1221Autoregressive multimodal
4FLUX.2 [max]Black Forest Labs~1207Flow-matching diffusion
5FLUX.2 [pro]Black Forest Labs~1191Flow-matching diffusion
6Lucid OriginLeonardo AI / Canva~1168Proprietary
~10Seedream 4.5ByteDance~1147Diffusion
SD 3.5 LargeStability AI~1150‑1180MMDiT diffusion

Architectural split: The top 3 spots are all autoregressive multimodal models (built on LLMs). Diffusion/flow-matching models dominate ranks 4+. Autoregressive models lead on instruction following and text rendering; diffusion models lead on controllability and open-source availability.

Model Profiles

Key capabilities and tradeoffs for each major model.

GPT Image 1.5 OpenAI
~1268ELO
Best instruction following Best text rendering Moderate faithfulness
Autoregressive, native multimodal. Replaced DALL-E 3. Leverages LLM reasoning for exceptional prompt understanding. Multi-turn conversational editing. Up to 6 reference images. Known facial likeness drift.
API: $0.005–0.20/img Open: No Refs: 6
Nano Banana 2 Google
~1262ELO
High faithfulness Best text rendering Can ignore complex prompts
Gemini 3.1 Flash Image. Native multimodal with chain-of-thought before pixel generation. 95%+ character consistency. Faster than Pro. Can add unrequested creative details. Integrated into Google Search, Lens, Gemini.
API: Flash pricing Open: No Refs: 14 (Pro)
FLUX.2 Black Forest Labs
~1207ELO (max)
10 reference images Partial open weights Adobe Photoshop integration
Flow-matching diffusion transformer. Klein models (4B/9B) are Apache 2.0. Kontext models specialize in instruction-based editing with 0.92+ cosine similarity across 6 successive edits. Hex color matching for brand work.
API: $0.014–0.07/img Open: Klein (Apache 2.0) Refs: 10
Midjourney V7 Midjourney
No ELO(not in arena)
Best aesthetics No API Fair text rendering
Widely considered the aesthetic leader — most art-directed output. Personalization on by default. --cref and --sref for character/style reference. Video generation (5–21s). Subscription-only via Discord/web.
Plans: $10–120/mo Open: No Refs: 1/mode
Stable Diffusion 3.5 Stability AI
~1165ELO (est)
Fully open weights Largest ecosystem Trails on quality
MMDiT architecture. Largest community ecosystem: LoRAs, ControlNets, IP-Adapter on CivitAI. Runs on consumer GPUs (8GB+). Free for orgs under $1M. IP-Adapter gives tunable faithfulness/instruction tradeoff.
API: $0.003–0.035/img Open: Yes (Community License) Refs: Via IP-Adapter
Ideogram 3.0 Ideogram
Strongtypography
~95% text accuracy 4.3B style presets
Industry-leading text/typography rendering. Excellent for graphic design, advertising, posters. Up to 3 style reference images. Random style feature. No character reference.
API: $0.03–0.10/img Open: No Refs: 3 (style only)
Recraft V4 Recraft
UniqueSVG output
Native SVG vectors Hex color control Design-focused
Only major AI model producing native SVG vector graphics. Excellent text placement and typography. Brand style creation from 1–5 reference images. Exact RGB/hex color specification. "Design taste" focus.
API: $0.04 raster / $0.08 SVG Open: No Refs: 5 (style)
Leonardo AI Canva
~1168ELO
Good all-rounder Free tier
Multiple specialized models: Phoenix, Lucid Origin, Lucid Realism. Combinable Character + Content + Style reference modes. Up to 6 references with Omni Models. Acquired by Canva. 150 free daily credits.
API: $0.018–0.087/img Open: No Refs: 6

The Core Tradeoff: Faithfulness vs. Instruction Following

When you give a model a reference image + text instructions, two goals compete. The more it follows instructions, the more it drifts from the reference. The more it preserves the reference, the less it can transform.

Preserves reference
Balanced
Follows instructions
SD + LoRA
Tunable weight
Nano Banana Pro
95%+ consistency
Nano Banana 2
Faster, more creative
FLUX Kontext
0.92+ across 6 edits
Midjourney V7
Beautifies everything
GPT Image 1.5
0.929 Reason-Edit
Ideogram / Recraft
Typography-first

Why this tradeoff exists

Autoregressive Models
GPT Image, Nano Banana. Generate tokens sequentially. Text prompts directly condition each step → great instruction following. But "remembering" pixel-level reference details while transforming is architecturally harder.
Diffusion Models
Flux, SD, Midjourney. Generate through iterative denoising. Naturally preserve spatial coherence → good faithfulness. But the global denoising process can cause "unintended spurious edits, bleeding into regions that should be unchanged."
Flow Matching
Flux Kontext. Maps noise to images through learned transport paths. Processes references and instructions jointly in latent space → best balance by design. Key differentiator: multi-turn robustness.

Model-by-model detail

Model Instruction Following Faithfulness Key Evidence Best For
GPT Image 1.5 Excellent Moderate 0.929 Reason-Edit (prev best: 0.572). Known facial likeness drift. Creative exploration, text-heavy images, ideation
Nano Banana Pro Moderate-Good High 95%+ character consistency. ~50% failure rate on certain style transfers. Final renders, character consistency, production assets
Nano Banana 2 Moderate-Good Good-High Faster than Pro but "takes creative freedom too far, adding unrequested details." Speed + quality when faithfulness isn't critical
FLUX Kontext Good Very High AuraFace cosine similarity >0.92 across 6 edits (competitors drop to ~0.80). Iterative editing, multi-turn consistency
Midjourney V7 Good Moderate Tends to "beautify" at cost of reference accuracy. --cw tunable 0–100. Artistic/creative work, aesthetic quality priority
SD + IP-Adapter Tunable Tunable Weight parameter explicitly controls tradeoff. LoRA+IP-Adapter = 80–90% consistency. Maximum control, technical users

Community wisdom: Use different models for different stages — GPT Image for ideationNano Banana or Flux Kontext for final rendersLoRA+IP-Adapter for maximum identity control.

Reference Image Capabilities

How many reference images can each model accept, and what kind of consistency do they maintain?

Model Max Refs Character Ref Style Ref Multi-Character Tunability
Nano Banana Pro 14 Native (5 chars) Native Yes (5) Limited
FLUX.2 10 Native Native Yes Via params
GPT Image 1.5 6 Native Via prompt Moderate Via prompt
Leonardo AI 6 Dedicated mode Dedicated mode Yes Low/Mid/High
Recraft V4 5 No Style ID No Brand colors (hex)
Ideogram 3.0 3 No Style refs No Limited
Midjourney V7 1/mode --cref --sref No --cw 0–100
SD 3.5 via tools IP-Adapter/LoRA IP-Adapter Style Via workflows Full control

IP-Adapter's superpower: A weight parameter (0.0–1.0) lets you explicitly tune the faithfulness–instruction tradeoff. Weight=0.0 = full prompt control. Weight=1.0 = full reference control. No other platform gives this granularity. Recommended pipeline: LoRA (identity) + IP-Adapter (pose) + weighted prompts (features) = 80–90% consistency.

Pricing Comparison

Cost per image at standard ~1024×1024 resolution, sorted cheapest to most expensive.

ProviderModel$/ImageQualityOpen?API?
fal.aiSDXL$0.003GoodYesYes
fal.aiFlux Schnell$0.003Good+Apache 2.0Yes
OpenAIGPT Image 1 Mini (low)$0.005BasicNoYes
fal.aiFlux 2 Dev$0.008Very GoodNon-commYes
BFLFLUX.2 [klein] 9B$0.015Good+Apache 2.0Yes
GoogleImagen 4 Fast$0.02Very GoodNoYes
BFLFLUX.2 [pro]$0.03Very GoodNoYes
Ideogram3.0 (fal.ai)$0.03Very GoodNoYes
OpenAIGPT Image 1.5 (medium)$0.04Very GoodNoYes
RecraftV4 (raster)$0.04Very GoodNoYes
BFLFLUX Kontext [pro]$0.04Very GoodNoYes
GoogleImagen 4 Ultra$0.06ExcellentNoYes
BFLFLUX.2 [max]$0.07ExcellentNoYes
RecraftV4 (vector SVG)$0.08Very GoodNoYes
Nano BananaPro (1080p)$0.139ExcellentNoYes
OpenAIGPT Image 1.5 (high)$0.20Top-tierNoYes
Nano BananaPro (4K)$0.24ExcellentNoYes
Self-hostedFlux Schnell / SD 3.5 / SDXL$0Good–Good+YesN/A
MidjourneyV7 (subscription)~$0.05–0.15Top-tierNoNo API

Volume Cost Projections

100,000 images/month
SDXL (fal.ai)~$300
Flux 2 Dev (fal.ai)~$800
Imagen 4 Fast~$2,000
GPT Image 1.5 (med)~$4,000
GPT Image 1 (high)~$16,700
1,000 images/month (indie)
Self-hosted (Flux/SD)$0 + elec
SDXL (fal.ai)~$3
Flux 2 Dev (fal.ai)~$8
GPT Image 1.5 (med)~$40
Midjourney Standard$30/mo

Free Tiers

ServiceFree Offering
SD 3.5 / SDXL / Flux SchnellFully free to self-host
Google Cloud$300 GCP credits (new accounts)
OpenAI API$5 credits for new accounts
Replicate~$5 credits
fal.ai~$5–10 credits (expire 90 days)
Leonardo AI150 daily credits (with peak wait times)
Recraft30–50 daily credits (images are public)
Ideogram10 slow credits/week
MidjourneyNo free tier

Full Feature Matrix

Side-by-side comparison across all major axes.

Feature GPT Image 1.5 FLUX.2 Midjourney V7 SD 3.5 Nano Banana Pro Ideogram 3.0 Recraft V4 Leonardo
Max Reference Images 6101/modevia IP-Adapter14356
Character Consistency NativeNative--crefIP-Adapter/LoRANative (5)NoNoYes
Text Rendering ExcellentVery GoodFairFairExcellentExcellentExcellentGood
Instruction Following ExcellentVery GoodGoodModerateMod-GoodGoodGoodGood
Faithfulness to Refs ModerateGood-VGoodModerateTunableHighN/AN/AGood
Vector/SVG Output NoNoNoNoNoNoYesNo
Official API YesYesNoYesYesYesYesYes
Open Weights NoPartialNoYesNoNoNoNo
Runs Locally NoKlein, SchnellNoYesNoNoNoNo
Multi-turn Editing YesVia KontextDraft ModeVia ComfyUIYesNoNoEdit w/ AI
Video Generation NoIn dev5–21sSeparateNoNoNoMotion

Recommendations by Use Case

What to pick depending on what you need.

"I need the best overall quality"
GPT Image 1.5 (high) or Nano Banana 2
Top ELO (~1262–1268). Best instruction following and text rendering.
"I need beautiful artistic output"
Midjourney V7
Unmatched aesthetics and "art direction." No API though — subscription only.
"I need character consistency across scenes"
Nano Banana Pro or FLUX Kontext Pro
14 refs / 5 chars (NB Pro). 0.92+ similarity across 6 edits (Kontext).
"I need text/typography in images"
Ideogram 3.0 or Recraft V4
~95% text accuracy. $0.03–0.04/image. Recraft also does SVG vectors.
"I need the cheapest API option"
Flux Schnell or SDXL on fal.ai
$0.003/image. Good quality at dirt-cheap pricing. 100K images = $300/mo.
"I need the best quality/price ratio"
Google Imagen 4 Fast or Flux 2 Pro
Strong quality at $0.02–0.03/image. Best value in the premium tier.
"I need to run it locally / self-host"
Flux Schnell (Apache 2.0) or SD 3.5
Free after hardware. SD has the largest LoRA/ControlNet ecosystem on CivitAI.
"I need vector/SVG output"
Recraft V4
Only major model with native SVG generation. $0.08/vector. Logos, icons, design elements.
"I need max control over faithfulness"
SD 3.5 + IP-Adapter + LoRA
Explicit weight slider (0.0–1.0) for the faithfulness–instruction tradeoff. Most technical.
"I need brand consistency + exact colors"
FLUX.2 or Recraft V4
Hex code color matching. 10 reference images (Flux). Style ID from brand images (Recraft).