Kimi K2.6 — open-weights 1T MoE, but the headline is agent orchestration not raw size

1T parameters total, 32B activated per token, Modified MIT (commercial OK below 100M MAU / $20M MRR). Tops SWE-Bench Pro at 58.6 (vs GPT-5.4 xhigh 57.7, Opus 4.6 max 53.4) and lands #4 on the Artificial Analysis Intelligence Index. The real differentiator is Agent Swarm scaling to 300 sub-agents over 4,000 coordinated steps — a different shape of capability than "bigger model, better single-step."

Verdict: The 300-sub-agent orchestration is the real differentiator

The take

Moonshot AI shipped Kimi K2.6 on April 20 with the open-weights drop on HuggingFace the same day. 1T MoE architecture (61 layers, 64 attention heads, MLA, 384 experts with 8 routed + 1 shared per token). 256K context across all variants.

The benchmarks: SWE-Bench Pro at 58.6 leads GPT-5.4 xhigh (57.7), Opus 4.6 max (53.4), and Gemini 3.1 Pro thinking (54.2) on that specific coding-agent eval. Artificial Analysis Intelligence Index ranks K2.6 at #4 across frontier (53–54 band, behind Anthropic/Google/OpenAI proprietary at 57). But the benchmarks are the boring part. The interesting part is the Agent Swarm architecture — 300 sub-agents scaling to 4,000 coordinated steps. For long-horizon agentic coding workflows where the bottleneck is coordination across many steps rather than per-step quality, that's a categorically different capability than the usual 'frontier means smarter single-step' story.

License: Modified MIT. Commercial use is fully open below 100 million monthly active users or $20 million monthly revenue. Above that you have to display 'Kimi K2' on the user interface. For 99.9% of builders this is effectively MIT — any sane interpretation says it's a clean commercial open-weight release.

Local reality: at 1T params total, K2.6 at Q4 is ~500 GB. That's a multi-H100/H200 cluster or hosted-API model. Self-hosting is technically possible on workstation rigs with extreme quantization but not where the model wants to live. The honest deployment paths are Moonshot's own API, OpenRouter, or self-deploy on serious GPU infrastructure.

Where K2.6 fits: agents.top as 'frontier hosted' alongside GLM-5.1. For local-first work, Qwen 3.6-35B-A3B remains the realistic agentic coding pick on 24 GB+. K2.6 is the model you reach for when you specifically need 300-agent-swarm orchestration and have hosted infrastructure — not a single-card daily driver.

Where this fits

Models: Kimi K2.6 · GLM-5.1 · Qwen 3.6-35B-A3B

Hardware: NVIDIA DGX Spark · Dual RTX 5090 · NVIDIA RTX A6000 (48 GB, used)

Sources

Next step

Try this in the planner→