VERIFIED APRIL 2026

MODEL · MISTRAL AI · 128B DENSE (FOLDS MAGISTRAL REASONING + DEVSTRAL 2 CODING INTO ONE WEIGHT SET)

Mistral Medium 3.5 128B

Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).

License: Modified MIT (commercial OK below revenue threshold; non-commercial above) · Context: 256K · Released: April 29, 2026

The decision in five lines

The call: Consider — runnable locally, family reference
Best for: Local evaluation and family reference
Runs on: 11 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out: Anything under 96 GB unified or 80 GB discrete — at Q4 it needs ~72 GB on disk plus KV.
Evidence: Estimated · last verified May 2026

128B dense (folds Magistral reasoning + Devstral 2 coding into one weight set): PARAMETERS
DENSE: TYPE
256K: CONTEXT
~72 GB (Q4_K_M) — practical floor is 4× 24 GB or 96 GB+ unified: VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).
When not to use: Anything under 96 GB unified or 80 GB discrete — at Q4 it needs ~72 GB on disk plus KV. Local picks under 70 GB should stick with Llama 3.3 70B, Qwen 3.5 122B-A10B (MoE, smaller active), or gpt-oss-120b. Also: revenue threshold in the modified MIT license matters for commercial deployments — read before shipping.

Runner notes

Hosted on `mistralai/Mistral-Medium-3.5-128B`. API pricing $1.50 in / $7.50 out per 1M. Local: 4× 24 GB GPUs (4× RTX 4090 / 5090) is the realistic floor; 96 GB+ unified Mac Studio M3 Ultra works at Q4_K_M. Ollama / llama.cpp paths via community quants. Vibe Remote Agents (May 3, 2026) launched alongside it for cloud-agent workflows.

License: Modified MIT (commercial OK below revenue threshold; non-commercial above)
Released: April 29, 2026
Maker: Mistral AI

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→