MODEL · COHERE · 218B TOTAL / 25B ACTIVE (128 EXPERTS, 8 ACTIVE + 1 SHARED PER TOKEN)
Command A+ (218B-A25B)
Cohere's frontier-class MoE: 218B params with 25B active per token, hybrid sliding-window + global attention, native vision + 48-language coverage. The first Apache-2.0 frontier MoE you can actually serve on 2× H100 — same hardware class as DeepSeek V4-Pro and Kimi K2.6 but with a permissive license neither of those carries.
License: Apache 2.0 · Context: 128K input / 64K max generation · Released: May 20, 2026
The decision in five lines
- The call
- Skip for local — for docs
- Best for
- docs · agents
- Runs on
- 3 hardware picks fit (cheapest: Framework Desktop (Ryzen AI Max+ 395) · $1,999)
- Watch out
- Also: tasks where the 25B active footprint is the wrong throughput shape — for fast-daily agent loops Qwen 3.6-35B-A3B (3B active) is materially cheaper to serve.
- Evidence
- Estimated
- 218B total
- PARAMETERS
- SPARSE MOE
- TYPE
- 128K
- CONTEXT
- ~110 GB (W4A4 build) — fits 2× H100 80 GB; ~218 GB BF16 / ~109 GB FP8
- VRAM AT Q4
Where we recommend this
Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.
The call
Cohere's frontier-class MoE: 218B params with 25B active per token, hybrid sliding-window + global attention, native vision + 48-language coverage. The first Apache-2.0 frontier MoE you can actually serve on 2× H100 — same hardware class as DeepSeek V4-Pro and Kimi K2.6 but with a permissive license neither of those carries.
When not to use: Anything under ~96 GB effective. Also: tasks where the 25B active footprint is the wrong throughput shape — for fast-daily agent loops Qwen 3.6-35B-A3B (3B active) is materially cheaper to serve.
Runner notes
`CohereLabs/command-a-plus-05-2026-bf16` is the base; `-fp8` and `-w4a4` variants for memory-constrained serving. vLLM + SGLang + transformers all support it day-one. Hybrid attention pattern (sliding window + global) means context-length efficiency is better than dense 70B-class at the same KV-cache budget.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
Next step
Find-by-model — see what hardware runs this→