Gemma 4 12B — frontier-ish multimodal that actually runs on a 16 GB laptop, under Apache 2.0

Google shipped Gemma 4 12B on June 3 — a ~12B dense, encoder-free unified multimodal model (text + image + audio + video in, text out) under Apache 2.0. Google's pitch is the deployment envelope: it runs locally on 16 GB of VRAM or unified memory while landing benchmark numbers Google says approach its 26B-A4B MoE, at under half the total memory footprint. It slots into the existing Gemma 4 family (31B dense + 26B MoE, April) as the laptop-friendly multimodal pick.

Verdict: Apache 2.0 encoder-free multimodal that fits a 16 GB laptop — the new local-multimodal sweet spot

The take

The facts, verified against Google's announcement and the Hugging Face model cards on launch day: `google/gemma-4-12B` and `google/gemma-4-12B-it` went public June 3, 2026 under a straight Apache 2.0 license — the same permissive terms the wider Gemma 4 line moved to in April. It is a ~12B dense model, not a MoE. The architecture headline is "encoder-free": vision and audio inputs flow directly into the LLM backbone rather than through separate modality encoders, so a single ~12B weight set handles text, image, audio (up to ~30s), and video-as-frames in, text out.

Why it matters for local: the whole point of the 12B is the hardware envelope. Google frames it as "laptop ready — small enough to run locally with just 16 GB of VRAM or unified memory," and claims benchmark performance "nearing our larger 26B MoE model on standard benchmarks, but at less than half the total memory footprint." At Q4 that puts it comfortably inside a single RTX 5060 Ti 16 GB, a 16 GB Mac, or an RTX 4090 with room for context — and it ships with Multi-Token-Prediction drafters to cut decode latency. We have not independently benchmarked the "nears 26B" claim; treat it as a vendor claim pending community numbers, the same posture we take on every launch.

Where it sits against the competition: at the 16 GB tier our standing chat/docs picks are Qwen 3.5 9B and the 26B/31B Gemma 4 variants (which want ~15–18 GB at Q4 and leave little headroom). Gemma 4 12B is the first genuinely multimodal Apache-2.0 model that fits the 16 GB tier with native audio + video understanding and room to actually use it — Qwen 3.5 9B is text-only, and MiniCPM-V-4.6 is vision-only at 1B. For a 16 GB laptop or mini-PC that needs one local model spanning text, images, and audio, this is now the pick to try first.

Where it fits in our taxonomy: it extends the existing /models/gemma-4/ family entry rather than getting its own planner slot this sweep — the 12B is the laptop-tier multimodal variant of a model we already track, and we want community Q4 VRAM + speed numbers before rotating it into the planner picks. If you are on 16 GB and want multimodal-on-device under a clean commercial license, pull `gemma-4:12b` and report back. Nothing about the hosted frontier (Claude, GPT, Gemini) changes here — this is a pure local-side win.

Where this fits

Models: Gemma 4 (31B dense + 26B A4B MoE + 12B multimodal) · Qwen 3.5 9B · MiniCPM-V-4.6 (1B vision-language) · Qwen 3.6-27B

Hardware: RTX 5060 Ti 16 GB · Mac Mini M4 16 GB · NVIDIA RTX 4090

Sources

Next step

Try this in the planner→