the AI bench
VERIFIED MAY 2026
All models

MODEL · COHERE · 218B TOTAL / 25B ACTIVE (128 EXPERTS, 8 ACTIVE + 1 SHARED PER TOKEN)

Command A+ (218B-A25B)

Cohere's frontier-class MoE: 218B params with 25B active per token, hybrid sliding-window + global attention, native vision + 48-language coverage. The first Apache-2.0 frontier MoE you can actually serve on 2× H100 — same hardware class as DeepSeek V4-Pro and Kimi K2.6 but with a permissive license neither of those carries.

License: Apache 2.0 · Context: 128K input / 64K max generation · Released: May 20, 2026

The decision in five lines

The call
Skip for local — for docs
Best for
docs · agents
Runs on
3 hardware picks fit (cheapest: Framework Desktop (Ryzen AI Max+ 395) · $1,999)
Watch out
Also: tasks where the 25B active footprint is the wrong throughput shape — for fast-daily agent loops Qwen 3.6-35B-A3B (3B active) is materially cheaper to serve.
Evidence
Estimated · last verified May 2026

218B total
PARAMETERS
SPARSE MOE
TYPE
128K
CONTEXT
~110 GB (W4A4 build) — fits 2× H100 80 GB; ~218 GB BF16 / ~109 GB FP8
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

DOCS ·
Command A+ (218B-A25B, Apache 2.0)May 20 2026 release — frontier Apache-2.0 MoE with hybrid sliding-window + global attention (cheaper KV at long context than dense 70B-class) + native vision for diagrams/tables. 128K input / 64K generation; 25B active. The first frontier Apache MoE you can self-host on 2× H100 / Mac M3 Ultra without a hosted-only caveat.
AGENTS ·
Command A+ (218B-A25B, Apache 2.0)May 20 2026 release — Cohere frontier MoE built explicitly for agentic + RAG. Native tool use, 128K context, 48-language coverage, native vision input. The first Apache 2.0 218B-class MoE — direct DeepSeek V4-Pro competitor with a permissive license neither V4-Pro nor Kimi K2.6 carry.

The call

Cohere's frontier-class MoE: 218B params with 25B active per token, hybrid sliding-window + global attention, native vision + 48-language coverage. The first Apache-2.0 frontier MoE you can actually serve on 2× H100 — same hardware class as DeepSeek V4-Pro and Kimi K2.6 but with a permissive license neither of those carries.

When not to use: Anything under ~96 GB effective. Also: tasks where the 25B active footprint is the wrong throughput shape — for fast-daily agent loops Qwen 3.6-35B-A3B (3B active) is materially cheaper to serve.

Runner notes

`CohereLabs/command-a-plus-05-2026-bf16` is the base; `-fp8` and `-w4a4` variants for memory-constrained serving. vLLM + SGLang + transformers all support it day-one. Hybrid attention pattern (sliding window + global) means context-length efficiency is better than dense 70B-class at the same KV-cache budget.

License
Apache 2.0
Released
May 20, 2026
Maker
Cohere

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this