North Mini Code — Cohere ships an Apache-2.0 30B-A3B coder for agents and the terminal

Cohere and Cohere Labs released North Mini Code on June 5 — a 30B-total / 3B-active sparse-MoE model tuned for code generation, agentic software engineering, and terminal tasks, under a clean Apache 2.0 license with a 256K context. At ~17 GB in a Q4 GGUF it runs on a single 24 GB GPU, which makes it a genuine new alternative to Qwen3-Coder-30B-A3B at that tier — so we're adding it to the planner's local coding picks.

Verdict: A fresh Apache-2.0 30B-A3B agentic coder that fits a 24 GB rig — the cleanest new local coding option since Qwen3-Coder

The take

The facts, verified against the Hugging Face model card (`CohereLabs/North-Mini-Code-1.0`, created 2026-06-05, Apache 2.0) and Cohere's blog: it's a 30B-total / 3B-active decoder-only sparse MoE (128 experts, 8 active per token), 256K context / 64K max output, "optimized for code generation, agentic software engineering, and terminal tasks." Official weights ship as BF16 and FP8 (plus W4A16 / NVFP4); community GGUF quants (unsloth, bartowski) and an MLX 4-bit build are already up, so the 24 GB-tier path is real on day one. Cohere benchmarks it on SWE-Bench Verified/Pro and Terminal-Bench v2 — vendor numbers, not yet independently reproduced.

Why it matters: the local coding shortlist has been heavily Qwen-led (Qwen3-Coder-30B-A3B, Qwen 3.5 35B-A3B). North Mini Code is the same 3B-active / 24 GB-tier shape from a Western lab, under the same permissive Apache 2.0 license, with a much larger 256K context — exactly the kind of clean, runnable alternative the planner exists to surface. You can try it before downloading in OpenCode or Cohere Labs' Hugging Face Space.

Our call: added to the local coding picks at the 24 GB tier as an alternative to Qwen3-Coder-30B-A3B, not a displacement — Qwen3-Coder stays the proven lead until North Mini Code's agentic-coding numbers are reproduced outside Cohere's own harness. If you're on a 24 GB+ rig and want a fresh Apache-2.0 coder to evaluate, pull a Q4 GGUF and benchmark it against your own repo. Strong first impression; verify before you commit your workflow to it.

Where this fits

Models: North Mini Code (30B-A3B) · Qwen3-Coder-30B-A3B · Command A+ (218B-A25B) · GLM-5.1

Hardware: NVIDIA RTX 5090 · NVIDIA RTX 4090 · NVIDIA RTX 3090 (used, single)

Sources

Next step

Try this in the planner→