DiffusionGemma 26B-A4B

Google DeepMind's discrete-diffusion take on Gemma 4 — instead of token-by-token autoregression, it denoises blocks of tokens ("canvases") in parallel for higher tokens/sec, on the 26B-A4B MoE Gemma 4 foundation. Multimodal input (text + image + video), a thinking mode, and an encoder-decoder design (cached AR prompt encoder + bidirectional diffusion decoder). Apache 2.0, engineered for low-latency single-accelerator local inference. A genuinely different generation architecture, not a version bump.

License: Apache 2.0 · Context: Inherits Gemma 4 · Released: June 9, 2026

The decision in five lines

The call: Consider — runnable locally, family reference
Best for: Local evaluation and family reference
Runs on: 16 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out: When you want a proven, tooling-mature default — diffusion LMs are new and runner support is thinner than standard autoregressive Gemma 4.
Evidence: Estimated · last verified July 2026

26B total: PARAMETERS
DISCRETE-DIFFUSION TEXT LM: TYPE
Inherits: CONTEXT
~15 GB (26B MoE at Q4): VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

Google DeepMind's discrete-diffusion take on Gemma 4 — instead of token-by-token autoregression, it denoises blocks of tokens ("canvases") in parallel for higher tokens/sec, on the 26B-A4B MoE Gemma 4 foundation. Multimodal input (text + image + video), a thinking mode, and an encoder-decoder design (cached AR prompt encoder + bidirectional diffusion decoder). Apache 2.0, engineered for low-latency single-accelerator local inference. A genuinely different generation architecture, not a version bump.
When not to use: When you want a proven, tooling-mature default — diffusion LMs are new and runner support is thinner than standard autoregressive Gemma 4. For most local work the dense/MoE Gemma 4 or Qwen 3.6 remain the safe picks; DiffusionGemma is the one to try when single-accelerator decode speed is the priority.

Runner notes

Google's DiffusionGemma docs + GitHub for the sampling loop (block-diffusion isn't a drop-in llama.cpp path yet). MoE keeps the footprint ~4B-active despite 26B total. Apache 2.0 — clean for commercial use, same as the rest of Gemma 4.

License: Apache 2.0
Released: June 9, 2026
Maker: Google DeepMind
Model card: huggingface.co/google/diffusiongemma-26B-A4B-it →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→