MODEL · MOONSHOT AI · 1T TOTAL / 32B ACTIVE (384 EXPERTS; 8 ROUTED + 1 SHARED PER TOKEN)
Kimi K2.6
Moonshot's 1T MoE agentic coder — keeps the K2 architecture (61 layers, 64 attention heads, MLA) and extends context to 256K. Tops SWE-Bench Pro at 58.6 (vs GPT-5.4 xhigh 57.7, Opus 4.6 max 53.4) and lands #4 on the Artificial Analysis Intelligence Index. The real differentiator is Agent Swarm — 300 sub-agents over 4,000 coordinated steps, a different shape of capability than single-step quality.
License: Modified MIT (commercial OK; attribution required only above 100M MAU or $20M MRR) · Context: 256K (across all variants) · Released: April 20, 2026
The decision in five lines
- The call
- Hosted only
- Best for
- coding · agents
- Runs on
- Hosted or workstation-class only · ~500 GB (not consumer-local; hosted or multi-GPU only)
- Watch out
- At ~500 GB Q4 this is a hosted-API or workstation-cluster model, not a Mac Studio or 5090 pick.
- Evidence
- Estimated
- 1T total
- PARAMETERS
- MOE
- TYPE
- 256K
- CONTEXT
- ~500 GB (not consumer-local; hosted or multi-GPU only)
- VRAM AT Q4
Where we recommend this
Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.
The call
Moonshot's 1T MoE agentic coder — keeps the K2 architecture (61 layers, 64 attention heads, MLA) and extends context to 256K. Tops SWE-Bench Pro at 58.6 (vs GPT-5.4 xhigh 57.7, Opus 4.6 max 53.4) and lands #4 on the Artificial Analysis Intelligence Index. The real differentiator is Agent Swarm — 300 sub-agents over 4,000 coordinated steps, a different shape of capability than single-step quality.
When not to use: Anything that fits on a single consumer card. At ~500 GB Q4 this is a hosted-API or workstation-cluster model, not a Mac Studio or 5090 pick. Also skip if you need plain chat — K2.6 is tuned for long-horizon agent loops, the chat experience trails Qwen 3.6-27B at half the active-param budget.
Runner notes
Hosted via Moonshot API, OpenRouter, or self-deploy on multi-H100/H200 clusters. Unsloth dynamic GGUFs likely 1–2 weeks out; until then, the practical path is the API. The Modified MIT clause means attribution kicks in only at 100M MAU / $20M MRR, so most builders treat it as effectively MIT.
Next step
Find-by-model — see what hardware runs this→