Gemma 4 (31B dense + 26B A4B MoE + 12B multimodal)

Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".

License: Apache 2.0 (moved off Gemma Terms) · Context: 256K · Released: April 2, 2026 (31B/26B); June 3, 2026 (12B)

The decision in five lines

The call: Buy — for chat
Best for: chat · docs
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context.
Evidence: Measured · last verified July 2026

31B dense: PARAMETERS
DENSE + MOE: TYPE
256K: CONTEXT
~18 GB (31B dense) / ~15 GB (26B MoE) / ~8 GB (12B dense): VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CHAT · TOP

Gemma 4 31B DenseGoogle April 2 2026 release; Arena top 5, 256K context, vision+audio native; Apache 2.0.

CHAT · HIGH

Gemma 4 26B MoE (3.8B active)Open Arena top 10 at 3.8B active compute; calm and fast.

DOCS · TOP

Gemma 4 31B (256K context)256K context with vision+audio; calmer long-context behaviour than the 35B-A3B MoE on dense retrieval prompts.

DOCS · HIGH

Gemma 4 31B (256K context)31B dense with 256K context; Gemma commercial-permissive terms; Arena top 5.

The call

Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".
When not to use: Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context. For 6 GB and under, Qwen 3.5 4B fits better.

Runner notes

Ollama tags `gemma4:31b`, `gemma4:26b`, and `gemma4:12b`. Ollama may lag on the audio modality path — use llama.cpp head for full multimodal. The 12B is encoder-free (vision/audio flow straight into the backbone) and ships MTP drafters for lower decode latency. MoE routing overhead can hurt vLLM concurrency vs dense equivalents under heavy batching. Google also shipped Gemma 4 QAT (quantization-aware-trained checkpoints for lower-VRAM deploys) and DiffusionGemma (a diffusion-based text model built on Gemma 4) in June 2026 — same family, niche variants.

License: Apache 2.0 (moved off Gemma Terms)
Released: April 2, 2026 (31B/26B); June 3, 2026 (12B)
Maker: Google
Model card: huggingface.co/google/gemma-4-31B-it →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→