Microsoft MAI at Build 2026 — seven first-party models, all hosted-only, none you can download

At Build 2026 on June 2, Mustafa Suleyman unveiled Microsoft AI's MAI family — seven first-party models spanning reasoning (MAI-Thinking-1), coding (MAI-Code-1, MAI-Code-1-Flash), image (MAI-Image-2.5 + Flash), transcription (MAI-Transcribe-1.5), and voice (MAI-Voice-2). Trained from scratch, no distillation. The story is Microsoft reducing its reliance on OpenAI — but for a local-AI site the headline is the asterisk: every MAI model is Microsoft Foundry / Azure-hosted, closed-weight, API-only. None are on Hugging Face, none are downloadable, none change a single local pick.

Verdict: Microsoft ships its first full first-party model family — all hosted, closed-weight, API-only; zero local picks change

The take

The facts, verified against Microsoft's own pages on launch day: the MAI models live at microsoft.ai/models and are distributed through Microsoft Foundry, with several also live on OpenRouter, Fireworks AI, and Baseten. Microsoft's catalog page lists access as API / Playground / Copilot only — there is no "download weights" path, and the microsoft Hugging Face org has no Build-2026 MAI repos (the only open-weight MAI artifact there is the unrelated April-2025 `MAI-DS-R1`, a DeepSeek-R1 post-train). So unlike Qwen, Gemma, DeepSeek, or Granite, you cannot run MAI on your own hardware. For this site's purposes they are cloud comparators, not local models — the same bucket as Claude, GPT-5.x, and Gemini.

MAI-Thinking-1 is the flagship. Per Suleyman's keynote it is a Mixture-of-Experts reasoning model with ~35B active parameters and a 256K context window, scoring 97% on AIME 2025; Microsoft says independent Surge raters preferred it over Claude Sonnet 4.6 in side-by-side quality, and that it matches Claude Opus 4.6 on SWE-Bench Pro at lower cost. It is in private preview on Foundry and public per-token pricing is not yet finalized. Strong claims, none independently benchmarked here — treat as vendor numbers pending community validation.

MAI-Code-1-Flash is the one our readers will actually meet, because it is rolling out inside GitHub Copilot and VS Code today. It is a small MoE (reported ~137B total / ~5B active) tuned for coding workflows, and Microsoft cites 71.6 vs Claude Haiku 4.5's 66.6 on SWE-Bench Verified and 51.2 vs 35.2 on SWE-Bench Pro, while using up to ~60% fewer tokens. GitHub's pricing page currently lists it around $0.75 / 1M input and $4.50 / 1M output (the model card notes pricing is still being finalized). The rest of the family: MAI-Image-2.5 (+ Flash) debuted #2/#3 on the Arena image leaderboards and is live in PowerPoint and OneDrive; MAI-Transcribe-1.5 leads FLEURS across 43 languages (one hour of audio in under 15s); MAI-Voice-2 (+ Flash) is expressive TTS with emotion control in 15+ languages.

Our call: no models.js or planner change, because there is nothing to run locally. We are also holding the cost calculator for now — MAI-Thinking-1 pricing is unannounced and MAI-Code-1-Flash's is flagged "being finalized," and we don't publish unstable pricing (we'll add MAI to the calculator next sweep once the numbers settle). The genuinely interesting wrinkle for the local-vs-cloud framing is MAI-Code-1-Flash: a cheap, fast, Copilot-embedded coding model that undercuts Haiku 4.5 on tokens is exactly the kind of cloud option that raises the bar a local coding rig has to clear. But that is a cloud-side shift — the best local coding picks (Qwen3-Coder-30B-A3B, Qwen 3.6-35B-A3B, GLM-5.1) are exactly where they were a week ago.

Where this fits

Models: Qwen3-Coder-30B-A3B · Qwen 3.6-35B-A3B · GLM-5.1 · Kimi K2.6

Hardware: NVIDIA RTX 4090 · NVIDIA DGX Spark · Mac Studio M3 Ultra 96 GB

Sources

Next step

Try this in the planner→