Command A+ — Cohere ships the Apache 2.0 frontier MoE the open-weight stack was missing

Cohere released Command A+ on May 20 — a 218B-param / 25B-active sparse MoE under Apache 2.0, with hybrid sliding-window + global attention, 128K input / 64K generation, native multimodal (text + image), and 48-language coverage. Available in BF16, FP8, and W4A4 quants on day one. The W4A4 build fits 2× H100 80 GB; the FP8 fits 1× MI300X. The first time you can self-host a 218B-class frontier MoE without a hosted-only or research-license caveat.

Verdict: First Apache 2.0 frontier MoE that fits 2× H100 — the permissive-license answer DeepSeek V4-Pro and Kimi K2.6 cannot match

The take

The license is the story. The frontier-class MoE shelf in May 2026 — DeepSeek V4-Pro (MIT but 1.6T params, effectively hosted-only), Kimi K2.6 (Modified MIT, 1T params, hosted), MiniMax M2.7 (non-commercial with attribution), GLM-5.1 (MIT, 744B params, hosted-only realistic) — has either been too big to serve, too restricted to deploy, or both. Command A+ sits at 218B/25B, which is hardware-feasible on 2× H100 / Mac M3 Ultra 128 GB / DGX Spark, and ships under straight Apache 2.0. That combination did not exist in open weights before today.

Architecture: sparse MoE with 128 experts, 8 active per token + 1 shared. Hybrid attention — sliding window for local context efficiency, global attention layers for cross-document reasoning. This is the key reason the KV-cache budget at 128K is meaningfully smaller than a dense 70B at the same context, which is what makes it serveable on 2× H100. Native multimodal: text + image inputs in the same weight set. 48-language coverage with an emphasis on enterprise document workflows (Cohere's traditional strength).

Where it fits in our planner: we've added Command A+ as the lead pick at agents.frontier and docs.frontier this week. It displaces DeepSeek V4-Pro as the recommended Apache-licensed option for buyers of 2× H100 / Mac Studio M3 Ultra 128 GB / DGX Spark. Existing picks (Qwen 3.5 122B-A10B, Llama 3.3 70B, gpt-oss-120b) stay — Command A+ is additive, not displacing, because its hybrid attention is genuinely different from Qwen's dense-block MoE.

What it doesn't do: this isn't a coding specialist (Qwen3-Coder-30B-A3B still beats it on SWE-bench at much smaller size). And the 25B active footprint means token throughput is slower than Qwen 3.6-35B-A3B (3B active) — for high-volume agent loops where active params matter more than capability, stay with the lighter MoEs. Command A+ is for when you need actual frontier-class reasoning with a serveable license.

Where this fits

Models: Command A+ (218B-A25B) · DeepSeek V4-Pro · Qwen 3.5 122B-A10B · gpt-oss-120b · Llama 3.3 70B Instruct

Hardware: NVIDIA DGX Spark · Mac Studio M3 Ultra 96 GB · Mac Studio M4 Max 64 GB · Framework Desktop (Ryzen AI Max+ 395)

Sources

Next step

Try this in the planner→