A multi-axis comparison of image generation models in early 2026 — quality, style, instruction following, faithfulness to references, multi-image support, pricing, and openness.
Research compiled March 2026 · Artificial Analysis Image Arena + provider docs + community reports
Artificial Analysis Image Arena blind-preference rankings, March 2026. Higher = better.
| # | Model | Organization | ELO | Architecture |
|---|---|---|---|---|
| 1 | GPT Image 1.5 (high) | OpenAI | ~1268 | Autoregressive multimodal |
| 2 | Nano Banana 2 (Gemini 3.1 Flash) | ~1262 | Autoregressive multimodal | |
| 3 | Nano Banana Pro (Gemini 3 Pro) | ~1221 | Autoregressive multimodal | |
| 4 | FLUX.2 [max] | Black Forest Labs | ~1207 | Flow-matching diffusion |
| 5 | FLUX.2 [pro] | Black Forest Labs | ~1191 | Flow-matching diffusion |
| 6 | Lucid Origin | Leonardo AI / Canva | ~1168 | Proprietary |
| ~10 | Seedream 4.5 | ByteDance | ~1147 | Diffusion |
| — | SD 3.5 Large | Stability AI | ~1150‑1180 | MMDiT diffusion |
Architectural split: The top 3 spots are all autoregressive multimodal models (built on LLMs). Diffusion/flow-matching models dominate ranks 4+. Autoregressive models lead on instruction following and text rendering; diffusion models lead on controllability and open-source availability.
Key capabilities and tradeoffs for each major model.
When you give a model a reference image + text instructions, two goals compete. The more it follows instructions, the more it drifts from the reference. The more it preserves the reference, the less it can transform.
| Model | Instruction Following | Faithfulness | Key Evidence | Best For |
|---|---|---|---|---|
| GPT Image 1.5 | Excellent | Moderate | 0.929 Reason-Edit (prev best: 0.572). Known facial likeness drift. | Creative exploration, text-heavy images, ideation |
| Nano Banana Pro | Moderate-Good | High | 95%+ character consistency. ~50% failure rate on certain style transfers. | Final renders, character consistency, production assets |
| Nano Banana 2 | Moderate-Good | Good-High | Faster than Pro but "takes creative freedom too far, adding unrequested details." | Speed + quality when faithfulness isn't critical |
| FLUX Kontext | Good | Very High | AuraFace cosine similarity >0.92 across 6 edits (competitors drop to ~0.80). | Iterative editing, multi-turn consistency |
| Midjourney V7 | Good | Moderate | Tends to "beautify" at cost of reference accuracy. --cw tunable 0–100. | Artistic/creative work, aesthetic quality priority |
| SD + IP-Adapter | Tunable | Tunable | Weight parameter explicitly controls tradeoff. LoRA+IP-Adapter = 80–90% consistency. | Maximum control, technical users |
Community wisdom: Use different models for different stages — GPT Image for ideation → Nano Banana or Flux Kontext for final renders → LoRA+IP-Adapter for maximum identity control.
How many reference images can each model accept, and what kind of consistency do they maintain?
| Model | Max Refs | Character Ref | Style Ref | Multi-Character | Tunability |
|---|---|---|---|---|---|
| Nano Banana Pro | 14 | Native (5 chars) | Native | Yes (5) | Limited |
| FLUX.2 | 10 | Native | Native | Yes | Via params |
| GPT Image 1.5 | 6 | Native | Via prompt | Moderate | Via prompt |
| Leonardo AI | 6 | Dedicated mode | Dedicated mode | Yes | Low/Mid/High |
| Recraft V4 | 5 | No | Style ID | No | Brand colors (hex) |
| Ideogram 3.0 | 3 | No | Style refs | No | Limited |
| Midjourney V7 | 1/mode | --cref | --sref | No | --cw 0–100 |
| SD 3.5 | via tools | IP-Adapter/LoRA | IP-Adapter Style | Via workflows | Full control |
IP-Adapter's superpower: A weight parameter (0.0–1.0) lets you explicitly tune the faithfulness–instruction tradeoff. Weight=0.0 = full prompt control. Weight=1.0 = full reference control. No other platform gives this granularity. Recommended pipeline: LoRA (identity) + IP-Adapter (pose) + weighted prompts (features) = 80–90% consistency.
Cost per image at standard ~1024×1024 resolution, sorted cheapest to most expensive.
| Provider | Model | $/Image | Quality | Open? | API? |
|---|---|---|---|---|---|
| fal.ai | SDXL | $0.003 | Good | Yes | Yes |
| fal.ai | Flux Schnell | $0.003 | Good+ | Apache 2.0 | Yes |
| OpenAI | GPT Image 1 Mini (low) | $0.005 | Basic | No | Yes |
| fal.ai | Flux 2 Dev | $0.008 | Very Good | Non-comm | Yes |
| BFL | FLUX.2 [klein] 9B | $0.015 | Good+ | Apache 2.0 | Yes |
| Imagen 4 Fast | $0.02 | Very Good | No | Yes | |
| BFL | FLUX.2 [pro] | $0.03 | Very Good | No | Yes |
| Ideogram | 3.0 (fal.ai) | $0.03 | Very Good | No | Yes |
| OpenAI | GPT Image 1.5 (medium) | $0.04 | Very Good | No | Yes |
| Recraft | V4 (raster) | $0.04 | Very Good | No | Yes |
| BFL | FLUX Kontext [pro] | $0.04 | Very Good | No | Yes |
| Imagen 4 Ultra | $0.06 | Excellent | No | Yes | |
| BFL | FLUX.2 [max] | $0.07 | Excellent | No | Yes |
| Recraft | V4 (vector SVG) | $0.08 | Very Good | No | Yes |
| Nano Banana | Pro (1080p) | $0.139 | Excellent | No | Yes |
| OpenAI | GPT Image 1.5 (high) | $0.20 | Top-tier | No | Yes |
| Nano Banana | Pro (4K) | $0.24 | Excellent | No | Yes |
| Self-hosted | Flux Schnell / SD 3.5 / SDXL | $0 | Good–Good+ | Yes | N/A |
| Midjourney | V7 (subscription) | ~$0.05–0.15 | Top-tier | No | No API |
| SDXL (fal.ai) | ~$300 |
| Flux 2 Dev (fal.ai) | ~$800 |
| Imagen 4 Fast | ~$2,000 |
| GPT Image 1.5 (med) | ~$4,000 |
| GPT Image 1 (high) | ~$16,700 |
| Self-hosted (Flux/SD) | $0 + elec |
| SDXL (fal.ai) | ~$3 |
| Flux 2 Dev (fal.ai) | ~$8 |
| GPT Image 1.5 (med) | ~$40 |
| Midjourney Standard | $30/mo |
| Service | Free Offering |
|---|---|
| SD 3.5 / SDXL / Flux Schnell | Fully free to self-host |
| Google Cloud | $300 GCP credits (new accounts) |
| OpenAI API | $5 credits for new accounts |
| Replicate | ~$5 credits |
| fal.ai | ~$5–10 credits (expire 90 days) |
| Leonardo AI | 150 daily credits (with peak wait times) |
| Recraft | 30–50 daily credits (images are public) |
| Ideogram | 10 slow credits/week |
| Midjourney | No free tier |
Side-by-side comparison across all major axes.
| Feature | GPT Image 1.5 | FLUX.2 | Midjourney V7 | SD 3.5 | Nano Banana Pro | Ideogram 3.0 | Recraft V4 | Leonardo |
|---|---|---|---|---|---|---|---|---|
| Max Reference Images | 6 | 10 | 1/mode | via IP-Adapter | 14 | 3 | 5 | 6 |
| Character Consistency | Native | Native | --cref | IP-Adapter/LoRA | Native (5) | No | No | Yes |
| Text Rendering | Excellent | Very Good | Fair | Fair | Excellent | Excellent | Excellent | Good |
| Instruction Following | Excellent | Very Good | Good | Moderate | Mod-Good | Good | Good | Good |
| Faithfulness to Refs | Moderate | Good-VGood | Moderate | Tunable | High | N/A | N/A | Good |
| Vector/SVG Output | No | No | No | No | No | No | Yes | No |
| Official API | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes |
| Open Weights | No | Partial | No | Yes | No | No | No | No |
| Runs Locally | No | Klein, Schnell | No | Yes | No | No | No | No |
| Multi-turn Editing | Yes | Via Kontext | Draft Mode | Via ComfyUI | Yes | No | No | Edit w/ AI |
| Video Generation | No | In dev | 5–21s | Separate | No | No | No | Motion |
What to pick depending on what you need.