the AI bench
VERIFIED MAY 2026
All models

MODEL · MICROSOFT · 3.8B (MMDIT 48-BLOCK + FLUX.2 SEMANTIC VAE + MULTI-LAYER GPT-OSS TEXT FEATURES)

Microsoft Lens (3.8B, MIT)

Microsoft's first foundational text-to-image model. Three-step ladder: `Lens-Base` (50-step supervised), `Lens` (20-step RL-tuned default), `Lens-Turbo` (4-step distilled). Architecture is novel: an MMDiT trunk paired with FLUX.2's semantic VAE and multi-layer features from a frozen GPT-OSS text model — Microsoft's public framing claims competitive quality at "substantially less training compute than larger T2I models." Cleanest MIT-licensed T2I at this param count.

License: MIT · Context: Up to 1,440 × 1,440 native; aspect ratios 1:2 to 2:1 · Released: May 19, 2026 (paper arxiv 2605.21573; HF model card finalized May 18)

The decision in five lines

The call
Buy — for image
Best for
image
Runs on
23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out
When you need the absolute T2I quality ceiling — FLUX.2 dev and HiDream-O1-Image still edge it on complex compositions.
Evidence
Estimated · last verified May 2026

3.8B (MMDiT 48-block + FLUX.2 semantic VAE + multi-layer GPT-OSS text features)
PARAMETERS
IMAGE GEN
TYPE
Up
CONTEXT
~8 GB BF16 on A100/V100 fallback; MXFP4 native on Hopper+ for ~half
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

IMAGE · HIGH
Microsoft Lens-Turbo (3.8B, MIT)May 19 2026 release. Three-step ladder (Base / RL-tuned / Turbo) with novel MMDiT + FLUX.2 VAE + GPT-OSS text features. 4-step distilled Turbo runs faster than Z-Image-Turbo at the same 3.8B class. MIT — cleanest license at this param count.

The call

Microsoft's first foundational text-to-image model. Three-step ladder: `Lens-Base` (50-step supervised), `Lens` (20-step RL-tuned default), `Lens-Turbo` (4-step distilled). Architecture is novel: an MMDiT trunk paired with FLUX.2's semantic VAE and multi-layer features from a frozen GPT-OSS text model — Microsoft's public framing claims competitive quality at "substantially less training compute than larger T2I models." Cleanest MIT-licensed T2I at this param count.

When not to use: When you need the absolute T2I quality ceiling — FLUX.2 dev and HiDream-O1-Image still edge it on complex compositions. Also: existing ComfyUI graphs targeting FLUX/SDXL won't port directly; the pipeline lives in Microsoft's `lens` Python package, which is still maturing.

Runner notes

`microsoft/Lens` is the RL-tuned default (20 steps, CFG 5.0). `microsoft/Lens-Turbo` is 4-step distilled with CFG 1.0 — the fast-path Z-Image-Turbo competitor. `microsoft/Lens-Base` is the supervised pre-distillation checkpoint, useful for downstream RL or fine-tuning. CPU offload (`enable_model_cpu_offload`) + `--disable_mxfp4 --offload` flags carry it down to A100/V100 budgets.

License
MIT
Released
May 19, 2026 (paper arxiv 2605.21573; HF model card finalized May 18)
Maker
Microsoft

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this