the AI bench
VERIFIED JULY 2026
All models

MODEL · GOOGLE DEEPMIND · 26B TOTAL / ~4B ACTIVE (8 OF 128 EXPERTS)

DiffusionGemma 26B-A4B

Google DeepMind's discrete-diffusion take on Gemma 4 — instead of token-by-token autoregression, it denoises blocks of tokens ("canvases") in parallel for higher tokens/sec, on the 26B-A4B MoE Gemma 4 foundation. Multimodal input (text + image + video), a thinking mode, and an encoder-decoder design (cached AR prompt encoder + bidirectional diffusion decoder). Apache 2.0, engineered for low-latency single-accelerator local inference. A genuinely different generation architecture, not a version bump.

License: Apache 2.0 · Context: Inherits Gemma 4 · Released: June 9, 2026

The decision in five lines

The call
Consider — runnable locally, family reference
Best for
Local evaluation and family reference
Runs on
16 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out
When you want a proven, tooling-mature default — diffusion LMs are new and runner support is thinner than standard autoregressive Gemma 4.
Evidence
Estimated · last verified July 2026

26B total
PARAMETERS
DISCRETE-DIFFUSION TEXT LM
TYPE
Inherits
CONTEXT
~15 GB (26B MoE at Q4)
VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

Google DeepMind's discrete-diffusion take on Gemma 4 — instead of token-by-token autoregression, it denoises blocks of tokens ("canvases") in parallel for higher tokens/sec, on the 26B-A4B MoE Gemma 4 foundation. Multimodal input (text + image + video), a thinking mode, and an encoder-decoder design (cached AR prompt encoder + bidirectional diffusion decoder). Apache 2.0, engineered for low-latency single-accelerator local inference. A genuinely different generation architecture, not a version bump.

When not to use: When you want a proven, tooling-mature default — diffusion LMs are new and runner support is thinner than standard autoregressive Gemma 4. For most local work the dense/MoE Gemma 4 or Qwen 3.6 remain the safe picks; DiffusionGemma is the one to try when single-accelerator decode speed is the priority.

Runner notes

Google's DiffusionGemma docs + GitHub for the sampling loop (block-diffusion isn't a drop-in llama.cpp path yet). MoE keeps the footprint ~4B-active despite 26B total. Apache 2.0 — clean for commercial use, same as the rest of Gemma 4.

License
Apache 2.0
Released
June 9, 2026
Maker
Google DeepMind

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this