the AI bench
VERIFIED MAY 2026

← All use cases

USE CASE · IMAGE

Local image generation.

HiDream-O1-Image (May 8 2026, 8B MIT, pixel-space architecture) is the new #1 open-weight model on Artificial Analysis T2I Arena (Elo 1184) — beats FLUX.2 dev (32B) and Qwen-Image-2512 (20B) at 3–7× fewer params. Choose by license (FLUX.2 dev is non-commercial) and by what you generate (text-rendering, photorealism, speed).


Verdict — Strong open-weight options at every tier above 8GB

HiDream-O1-Image (8B MIT) at full precision sits cleanly at the top here too. Z-Image-Turbo (Apache 2.0 community daily driver, sub-second on H800). FLUX.2 dev for absolute quality if non-commercial works.


What's the answer at each tier

Frontier (64+ GB)

HiDream-I1 Full FP16 (17B MIT) + FLUX.2 dev FP16 (32B non-commercial) + Qwen-Image-2512 (20B Apache 2.0, SOTA CN+EN text rendering). Quality ceiling locally — FP16 weights need 64+ GB resident.

  1. FLUX.2 dev FP16 (~64 GB, no quant compromise) — Full-precision 32B at native FP16. Quality ceiling locally — no Q4/FP8 artifacts. Frontier-only because FP16 weights need 64+ GB resident.
  2. HiDream-I1 Full (17B, FP16, MIT) — April 2025 release; community testing shows it outperforms SDXL, DALL-E 3, and FLUX.1 on key benchmarks. MIT-licensed — clean for commercial redistribution. Frontier hardware unlocks the full FP16 weights.
  3. Wan 2.2 (A14B variants) — Community-standard local video generation in 2026. A14B variants need 16-24 GB minimum; frontier 96+ GB has headroom for 1080p / longer clips / batched runs. Pairs with ComfyUI workflow.
Top (32+ GB)

HiDream-O1-Image (8B MIT) at full precision sits cleanly at the top here too. Z-Image-Turbo (Apache 2.0 community daily driver, sub-second on H800). FLUX.2 dev for absolute quality if non-commercial works.

  1. Z-Image-Turbo (Apache 2.0) — Community daily driver for realism; 6B, 8-step inference, Apache 2.0 — commercial OK.
  2. FLUX.2 dev (non-commercial) — Quality ceiling at 32B; FLUX Non-Commercial license — commercial deployment needs paid BFL license.
  3. Qwen-Image-2512 (20B, Apache 2.0) — Latest released Qwen-Image (Dec 31 2025); claims #1 open-source on AI Arena. Best-in-class text rendering (CN + EN); pair with Edit-2511 for editing workflows.
High (20–24 GB)

HiDream-O1-Image FP8 leads at 16-24 GB — #1 open-weight on AA T2I Arena per May 2026 ranking, runs in ~10 GB VRAM, MIT licensing for commercial use. Qwen-Image-2512 for text rendering. FLUX.2 klein 9B (non-commercial) as third pick.

  1. HiDream-O1-Image (8B, MIT) — May 8, 2026 release. Pixel-space (no VAE, no disjoint text encoder) — debuted top-10 on Artificial Analysis T2I Arena. MIT-licensed 8B; one model handles T2I + edit + subject-driven personalization at up to 2,048².
  2. Qwen-Image-2512 (20B, Apache 2.0) — Dec 31 2025 release; latest runnable Qwen-Image. Best-in-class CN + EN text rendering at this tier; Apache 2.0 — rare at this quality.
  3. FLUX.2 klein 9B (non-commercial) — Distilled FLUX.2 quality at 9B; FLUX Non-Commercial license — free for personal, paid for commercial.
Mid (12–16 GB)

HiDream-O1-Image-Dev-2604 (distilled, 28-step) is the newer freshest pick. FLUX.2 klein 4B (Apache 2.0 commercial) remains the right Apache-clean default. HiDream-I1 Fast for higher quality at the same size class.

  1. FLUX.2 klein 4B (Apache 2.0) — BFL's first fully Apache-2.0 model; 4B distilled for fast inference on mid-tier GPUs; commercial OK.
  2. SANA-1.6B (non-commercial) — NVIDIA 4K-capable 1.6B; extremely fast on mid-tier GPUs; weights are NVIDIA NSCL v2 (non-commercial).
  3. Z-Image-Turbo (FP8, fits 8GB) — FP8 quant runs comfortably on an 8GB card with Apache 2.0 commercial license.
Low (6–12 GB / CPU)

SANA-0.6B (NVIDIA NSCL v2 non-commercial) is fast on 6-8GB VRAM; Z-Image-Turbo at int4 fits the same window. SD 3.5 Medium is the SAI-licensed fallback.

  1. SANA-0.6B (non-commercial) — 0.6B params; <1s per 1024² on a 16GB laptop GPU; weights are NVIDIA NSCL v2 (non-commercial).
  2. Z-Image-Turbo (int4, 6GB) — Community int4 quant fits 6GB VRAM; Apache 2.0 — commercial-OK at this tier is rare.
  3. SD 3.5 Medium — 2.6B; ~10GB VRAM; reliable; SAI Community License (free under $1M revenue).

How to actually run it

ComfyUI is the standard runner — every major checkpoint has community node workflows. Forge for SD-family lineage. Diffusers Python directly for batch generation pipelines. Mac users: native MLX inference is well-supported for FLUX + SD families; HiDream-O1 Mac performance not yet community-benchmarked.


Watchouts

  • License landscape is fractured. FLUX.2 dev is non-commercial; FLUX.2 klein 4B is Apache 2.0 but 9B variant is non-commercial; HiDream-I1 and HiDream-O1 are MIT (commercial clean); Qwen-Image series is Apache 2.0; SANA is NVIDIA NSCL v2 non-commercial; SD 3.5 is SAI Community License. Read each before commercial deployment.
  • HiDream-O1 (May 2026) ships with a new pixel-space architecture (no VAE, no separate text encoders) — existing ComfyUI custom-node graphs for FLUX/SDXL won't port cleanly. Plan for 1-2 weeks of node ecosystem catch-up.
  • Qwen-Image 2.0 (the unified 7B successor) was announced Feb 10 2026 but weights are NOT yet open-sourced. Stick with Qwen-Image-2512 (Dec 2025, 20B Apache 2.0) until 2.0 weights ship.
  • Top 5 of AA T2I Arena is all CLOSED models (GPT Image 2 Elo 1338, Nano Banana 2/Pro, MAI-Image-2). Top open-weight peaks at Elo 1184. For absolute quality first-try, cloud still leads.

When cloud still wins

You need GPT Image 2 / Nano Banana / Midjourney-class consistency on the first try, or you don't want to manage a ComfyUI workflow. Local image gen is genuinely good now but the iteration loop is faster on cloud for one-off creative work. For volume + privacy + specific style fine-tuning, local wins.


Hardware that fits this use case


Related guides


Next step

Try the planner with Image generation preselected

The planner pulls all six dimensions together — your hardware, your VRAM/RAM, your GPU family, your context, and your priorities — and returns specific picks with fit badges.


Notes flagged for next refresh

Wan 2.2 (A14B + variants) is text-to-VIDEO, not image — flagged for a separate /use-cases/video/ page when built. SANA-WM (NVIDIA, May 16 2026) is also video — 2.6B world-model, single-RTX-5090 60-sec 720p. HiDream-O1-Image-Dev-2604 is the freshest distilled image checkpoint (~3 days old at time of writing).