Mistral Medium 3.5 — one 128B dense model replaces three specialist Mistrals

Mistral retired Magistral (reasoning) and Devstral 2 (coding) and folded both into Medium 3.5: a 128B dense weight set with 256K context, native multimodal vision, 77.6% on SWE-Bench Verified, and a per-request `reasoning_effort` toggle that swaps modes without swapping checkpoints. Modified MIT license — commercial OK below a revenue threshold.

Verdict: Mistral folded reasoning + coding specialists into one 128B with a per-request `reasoning_effort` toggle

The take

The release matters more than the score. For the past year, Mistral has been splitting its open-weight strategy across three weight sets: Mistral Medium 3.1 for general use, Magistral for reasoning, Devstral 2 for coding. Three checkpoints to keep resident if you wanted full coverage. Medium 3.5 collapses that to one weight set with a `reasoning_effort` parameter on every request — `low` for chat-class latency, `high` for Magistral-class chains-of-thought. Same model, different inference budget. This is a meaningful operational improvement for anyone running Mistral locally or on managed infrastructure.

The numbers: 77.6% SWE-Bench Verified is competitive with GLM-5.1 and ahead of Qwen3-Coder-30B-A3B for raw coding capability. The 256K context window is mid-pack for 2026 (Qwen 3.5 has 262K, Llama 4 Scout claims more). Native multimodal vision is trained from scratch in this release rather than bolted on — meaningful for OCR + document workflows.

The local-deployment math: 128B dense at Q4_K_M is ~72 GB on disk plus KV cache. Practical floor is 4× 24 GB GPUs (4× RTX 4090 or 5090) or 96+ GB unified Mac Studio M3 Ultra. Single-card local users on 24-32 GB rigs have nothing to do with this release directly — Qwen 3.5 35B-A3B / Qwen 3.6-35B-A3B / Qwen3-Coder-30B-A3B remain the right picks at that tier. Multi-GPU and frontier-tier buyers get a new option.

License gotcha: Modified MIT means free commercial use below a revenue threshold, with a paid license required above. Read the exact terms on the Mistral repo before shipping a paid product on these weights. The Apache-2.0 vs Modified-MIT distinction matters for legal review and is the same trap as FLUX.2 dev\'s non-commercial clause — \"open weights\" is not the same as \"clean commercial.\"

Practical read: hosted via Mistral\'s API at $1.50 in / $7.50 out per 1M is the cheap way to try it; local deployment makes sense if you already have a 4-GPU rig or 96+ GB unified box and want one weight set for chat + reasoning + coding. Added /models/mistral-medium-3-5/ this week with full editorial.

Where this fits

Models: Mistral Medium 3.5 128B · Ministral 3 family (3B / 8B / 14B) · GLM-5.1 · Qwen3-Coder-30B-A3B

Hardware: NVIDIA RTX 5090 · Mac Studio M3 Ultra 96 GB · NVIDIA DGX Spark · NVIDIA RTX A6000 (48 GB, used)

Sources

Next step

Try this in the planner→