Image Generation Models Compared

ELO Leaderboard

Artificial Analysis Image Arena blind-preference rankings, March 2026. Higher = better.

#	Model	Organization	ELO	Architecture
1	GPT Image 1.5 (high)	OpenAI	~1268	Autoregressive multimodal
2	Nano Banana 2 (Gemini 3.1 Flash)	Google	~1262	Autoregressive multimodal
3	Nano Banana Pro (Gemini 3 Pro)	Google	~1221	Autoregressive multimodal
4	FLUX.2 [max]	Black Forest Labs	~1207	Flow-matching diffusion
5	FLUX.2 [pro]	Black Forest Labs	~1191	Flow-matching diffusion
6	Lucid Origin	Leonardo AI / Canva	~1168	Proprietary
~10	Seedream 4.5	ByteDance	~1147	Diffusion
—	SD 3.5 Large	Stability AI	~1150‑1180	MMDiT diffusion

Architectural split: The top 3 spots are all autoregressive multimodal models (built on LLMs). Diffusion/flow-matching models dominate ranks 4+. Autoregressive models lead on instruction following and text rendering; diffusion models lead on controllability and open-source availability.

Model Profiles

Key capabilities and tradeoffs for each major model.

GPT Image 1.5 OpenAI

~1268ELO

Best instruction following Best text rendering Moderate faithfulness

Autoregressive, native multimodal. Replaced DALL-E 3. Leverages LLM reasoning for exceptional prompt understanding. Multi-turn conversational editing. Up to 6 reference images. Known facial likeness drift.

API: $0.005–0.20/img Open: No Refs: 6

Nano Banana 2 Google

~1262ELO

High faithfulness Best text rendering Can ignore complex prompts

Gemini 3.1 Flash Image. Native multimodal with chain-of-thought before pixel generation. 95%+ character consistency. Faster than Pro. Can add unrequested creative details. Integrated into Google Search, Lens, Gemini.

API: Flash pricing Open: No Refs: 14 (Pro)

FLUX.2 Black Forest Labs

~1207ELO (max)

10 reference images Partial open weights Adobe Photoshop integration

Flow-matching diffusion transformer. Klein models (4B/9B) are Apache 2.0. Kontext models specialize in instruction-based editing with 0.92+ cosine similarity across 6 successive edits. Hex color matching for brand work.

API: $0.014–0.07/img Open: Klein (Apache 2.0) Refs: 10

Midjourney V7 Midjourney

No ELO(not in arena)

Best aesthetics No API Fair text rendering

Widely considered the aesthetic leader — most art-directed output. Personalization on by default. --cref and --sref for character/style reference. Video generation (5–21s). Subscription-only via Discord/web.

Plans: $10–120/mo Open: No Refs: 1/mode

Stable Diffusion 3.5 Stability AI

~1165ELO (est)

Fully open weights Largest ecosystem Trails on quality

MMDiT architecture. Largest community ecosystem: LoRAs, ControlNets, IP-Adapter on CivitAI. Runs on consumer GPUs (8GB+). Free for orgs under $1M. IP-Adapter gives tunable faithfulness/instruction tradeoff.

API: $0.003–0.035/img Open: Yes (Community License) Refs: Via IP-Adapter

Ideogram 3.0 Ideogram

Strongtypography

~95% text accuracy 4.3B style presets

Industry-leading text/typography rendering. Excellent for graphic design, advertising, posters. Up to 3 style reference images. Random style feature. No character reference.

API: $0.03–0.10/img Open: No Refs: 3 (style only)

Recraft V4 Recraft

UniqueSVG output

Native SVG vectors Hex color control Design-focused

Only major AI model producing native SVG vector graphics. Excellent text placement and typography. Brand style creation from 1–5 reference images. Exact RGB/hex color specification. "Design taste" focus.

API: $0.04 raster / $0.08 SVG Open: No Refs: 5 (style)

Leonardo AI Canva

~1168ELO

Good all-rounder Free tier

Multiple specialized models: Phoenix, Lucid Origin, Lucid Realism. Combinable Character + Content + Style reference modes. Up to 6 references with Omni Models. Acquired by Canva. 150 free daily credits.

API: $0.018–0.087/img Open: No Refs: 6

The Core Tradeoff: Faithfulness vs. Instruction Following

When you give a model a reference image + text instructions, two goals compete. The more it follows instructions, the more it drifts from the reference. The more it preserves the reference, the less it can transform.

Preserves reference

Balanced

Follows instructions

SD + LoRA

Tunable weight

Nano Banana Pro

95%+ consistency

Nano Banana 2

Faster, more creative

FLUX Kontext

0.92+ across 6 edits

Midjourney V7

Beautifies everything

GPT Image 1.5

0.929 Reason-Edit

Ideogram / Recraft

Typography-first

Why this tradeoff exists

Autoregressive Models

GPT Image, Nano Banana. Generate tokens sequentially. Text prompts directly condition each step → great instruction following. But "remembering" pixel-level reference details while transforming is architecturally harder.

Diffusion Models

Flux, SD, Midjourney. Generate through iterative denoising. Naturally preserve spatial coherence → good faithfulness. But the global denoising process can cause "unintended spurious edits, bleeding into regions that should be unchanged."

Flow Matching

Flux Kontext. Maps noise to images through learned transport paths. Processes references and instructions jointly in latent space → best balance by design. Key differentiator: multi-turn robustness.

Model-by-model detail

Model	Instruction Following	Faithfulness	Key Evidence	Best For
GPT Image 1.5	Excellent	Moderate	0.929 Reason-Edit (prev best: 0.572). Known facial likeness drift.	Creative exploration, text-heavy images, ideation
Nano Banana Pro	Moderate-Good	High	95%+ character consistency. ~50% failure rate on certain style transfers.	Final renders, character consistency, production assets
Nano Banana 2	Moderate-Good	Good-High	Faster than Pro but "takes creative freedom too far, adding unrequested details."	Speed + quality when faithfulness isn't critical
FLUX Kontext	Good	Very High	AuraFace cosine similarity >0.92 across 6 edits (competitors drop to ~0.80).	Iterative editing, multi-turn consistency
Midjourney V7	Good	Moderate	Tends to "beautify" at cost of reference accuracy. --cw tunable 0–100.	Artistic/creative work, aesthetic quality priority
SD + IP-Adapter	Tunable	Tunable	Weight parameter explicitly controls tradeoff. LoRA+IP-Adapter = 80–90% consistency.	Maximum control, technical users

Community wisdom: Use different models for different stages — GPT Image for ideation → Nano Banana or Flux Kontext for final renders → LoRA+IP-Adapter for maximum identity control.

Reference Image Capabilities

How many reference images can each model accept, and what kind of consistency do they maintain?

Model	Max Refs	Character Ref	Style Ref	Multi-Character	Tunability
Nano Banana Pro	14	Native (5 chars)	Native	Yes (5)	Limited
FLUX.2	10	Native	Native	Yes	Via params
GPT Image 1.5	6	Native	Via prompt	Moderate	Via prompt
Leonardo AI	6	Dedicated mode	Dedicated mode	Yes	Low/Mid/High
Recraft V4	5	No	Style ID	No	Brand colors (hex)
Ideogram 3.0	3	No	Style refs	No	Limited
Midjourney V7	1/mode	--cref	--sref	No	--cw 0–100
SD 3.5	via tools	IP-Adapter/LoRA	IP-Adapter Style	Via workflows	Full control

IP-Adapter's superpower: A weight parameter (0.0–1.0) lets you explicitly tune the faithfulness–instruction tradeoff. Weight=0.0 = full prompt control. Weight=1.0 = full reference control. No other platform gives this granularity. Recommended pipeline: LoRA (identity) + IP-Adapter (pose) + weighted prompts (features) = 80–90% consistency.

Pricing Comparison

Cost per image at standard ~1024×1024 resolution, sorted cheapest to most expensive.

Provider	Model	$/Image	Quality	Open?	API?
fal.ai	SDXL	$0.003	Good	Yes	Yes
fal.ai	Flux Schnell	$0.003	Good+	Apache 2.0	Yes
OpenAI	GPT Image 1 Mini (low)	$0.005	Basic	No	Yes
fal.ai	Flux 2 Dev	$0.008	Very Good	Non-comm	Yes
BFL	FLUX.2 [klein] 9B	$0.015	Good+	Apache 2.0	Yes
Google	Imagen 4 Fast	$0.02	Very Good	No	Yes
BFL	FLUX.2 [pro]	$0.03	Very Good	No	Yes
Ideogram	3.0 (fal.ai)	$0.03	Very Good	No	Yes
OpenAI	GPT Image 1.5 (medium)	$0.04	Very Good	No	Yes
Recraft	V4 (raster)	$0.04	Very Good	No	Yes
BFL	FLUX Kontext [pro]	$0.04	Very Good	No	Yes
Google	Imagen 4 Ultra	$0.06	Excellent	No	Yes
BFL	FLUX.2 [max]	$0.07	Excellent	No	Yes
Recraft	V4 (vector SVG)	$0.08	Very Good	No	Yes
Nano Banana	Pro (1080p)	$0.139	Excellent	No	Yes
OpenAI	GPT Image 1.5 (high)	$0.20	Top-tier	No	Yes
Nano Banana	Pro (4K)	$0.24	Excellent	No	Yes
Self-hosted	Flux Schnell / SD 3.5 / SDXL	$0	Good–Good+	Yes	N/A
Midjourney	V7 (subscription)	~$0.05–0.15	Top-tier	No	No API

Volume Cost Projections

100,000 images/month

SDXL (fal.ai)	~$300
Flux 2 Dev (fal.ai)	~$800
Imagen 4 Fast	~$2,000
GPT Image 1.5 (med)	~$4,000
GPT Image 1 (high)	~$16,700

1,000 images/month (indie)

Self-hosted (Flux/SD)	$0 + elec
SDXL (fal.ai)	~$3
Flux 2 Dev (fal.ai)	~$8
GPT Image 1.5 (med)	~$40
Midjourney Standard	$30/mo

Free Tiers

Service	Free Offering
SD 3.5 / SDXL / Flux Schnell	Fully free to self-host
Google Cloud	$300 GCP credits (new accounts)
OpenAI API	$5 credits for new accounts
Replicate	~$5 credits
fal.ai	~$5–10 credits (expire 90 days)
Leonardo AI	150 daily credits (with peak wait times)
Recraft	30–50 daily credits (images are public)
Ideogram	10 slow credits/week
Midjourney	No free tier

Full Feature Matrix

Side-by-side comparison across all major axes.

Feature	GPT Image 1.5	FLUX.2	Midjourney V7	SD 3.5	Nano Banana Pro	Ideogram 3.0	Recraft V4	Leonardo
Max Reference Images	6	10	1/mode	via IP-Adapter	14	3	5	6
Character Consistency	Native	Native	--cref	IP-Adapter/LoRA	Native (5)	No	No	Yes
Text Rendering	Excellent	Very Good	Fair	Fair	Excellent	Excellent	Excellent	Good
Instruction Following	Excellent	Very Good	Good	Moderate	Mod-Good	Good	Good	Good
Faithfulness to Refs	Moderate	Good-VGood	Moderate	Tunable	High	N/A	N/A	Good
Vector/SVG Output	No	No	No	No	No	No	Yes	No
Official API	Yes	Yes	No	Yes	Yes	Yes	Yes	Yes
Open Weights	No	Partial	No	Yes	No	No	No	No
Runs Locally	No	Klein, Schnell	No	Yes	No	No	No	No
Multi-turn Editing	Yes	Via Kontext	Draft Mode	Via ComfyUI	Yes	No	No	Edit w/ AI
Video Generation	No	In dev	5–21s	Separate	No	No	No	Motion

Image Generation Compared

ELO Leaderboard

Model Profiles

The Core Tradeoff: Faithfulness vs. Instruction Following

Why this tradeoff exists

Model-by-model detail

Reference Image Capabilities

Pricing Comparison

Volume Cost Projections

Free Tiers

Full Feature Matrix

Recommendations by Use Case