MODEL · GOOGLE DEEPMIND · 26B TOTAL / ~4B ACTIVE (8 OF 128 EXPERTS)
DiffusionGemma 26B-A4B
Google DeepMind's discrete-diffusion take on Gemma 4 — instead of token-by-token autoregression, it denoises blocks of tokens ("canvases") in parallel for higher tokens/sec, on the 26B-A4B MoE Gemma 4 foundation. Multimodal input (text + image + video), a thinking mode, and an encoder-decoder design (cached AR prompt encoder + bidirectional diffusion decoder). Apache 2.0, engineered for low-latency single-accelerator local inference. A genuinely different generation architecture, not a version bump.
License: Apache 2.0 · Context: Inherits Gemma 4 · Released: June 9, 2026
The decision in five lines
- The call
- Consider — runnable locally, family reference
- Best for
- Local evaluation and family reference
- Runs on
- 16 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
- Watch out
- When you want a proven, tooling-mature default — diffusion LMs are new and runner support is thinner than standard autoregressive Gemma 4.
- Evidence
- Estimated
- 26B total
- PARAMETERS
- DISCRETE-DIFFUSION TEXT LM
- TYPE
- Inherits
- CONTEXT
- ~15 GB (26B MoE at Q4)
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
Google DeepMind's discrete-diffusion take on Gemma 4 — instead of token-by-token autoregression, it denoises blocks of tokens ("canvases") in parallel for higher tokens/sec, on the 26B-A4B MoE Gemma 4 foundation. Multimodal input (text + image + video), a thinking mode, and an encoder-decoder design (cached AR prompt encoder + bidirectional diffusion decoder). Apache 2.0, engineered for low-latency single-accelerator local inference. A genuinely different generation architecture, not a version bump.
When not to use: When you want a proven, tooling-mature default — diffusion LMs are new and runner support is thinner than standard autoregressive Gemma 4. For most local work the dense/MoE Gemma 4 or Qwen 3.6 remain the safe picks; DiffusionGemma is the one to try when single-accelerator decode speed is the priority.
Runner notes
Google's DiffusionGemma docs + GitHub for the sampling loop (block-diffusion isn't a drop-in llama.cpp path yet). MoE keeps the footprint ~4B-active despite 26B total. Apache 2.0 — clean for commercial use, same as the rest of Gemma 4.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Minisforum UM890 ProPerfect · 1.4× 32 GB DDR5 (shared) · $463–$580 all-in
- AMD Radeon RX 7900 XTXPerfect · 1.4× 24 GB · $810 used / ~$1,340 new
- NVIDIA RTX 3090 (used, single)Perfect · 1.4× 24 GB · $950–$1,200
- MacBook Air M5 24 GBRequires tweak · 1.3× 24 GB unified · $1,499–$1,899
- Mac Mini M4 Pro 24 GBRequires tweak · 1.3× 24 GB unified · $1,599
- Dual RTX 3090 (used)Perfect · 2.9× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 5.2× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 1.4× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 1.9× 48 GB unified · $2,999–$3,599
- NVIDIA RTX 5090Perfect · 1.9× 32 GB · $3,500–$4,300
- NVIDIA RTX A6000 (48 GB, used)Perfect · 2.9× 48 GB ECC · $3,500–$4,500
- Mac Studio M4 Max 64 GBPerfect · 2.6× 64 GB unified · $3,799
- NVIDIA DGX SparkPerfect · 5.2× 128 GB unified · $4,699
- M5 Max MacBook Pro 64 GBPerfect · 2.6× 64 GB unified · ~$5,199 (est.; June 25 2026 increase)
- Mac Studio M3 Ultra 96 GBPerfect · 3.9× 96 GB unified · $5,299
- Dual RTX 5090Perfect · 3.9× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→