MODEL · MISTRAL AI · 128B DENSE (FOLDS MAGISTRAL REASONING + DEVSTRAL 2 CODING INTO ONE WEIGHT SET)
Mistral Medium 3.5 128B
Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).
License: Modified MIT (commercial OK below revenue threshold; non-commercial above) · Context: 256K · Released: April 29, 2026
The decision in five lines
- The call
- Consider — runnable locally, family reference
- Best for
- Local evaluation and family reference
- Runs on
- 11 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
- Watch out
- Anything under 96 GB unified or 80 GB discrete — at Q4 it needs ~72 GB on disk plus KV.
- Evidence
- Estimated
- 128B dense (folds Magistral reasoning + Devstral 2 coding into one weight set)
- PARAMETERS
- DENSE
- TYPE
- 256K
- CONTEXT
- ~72 GB (Q4_K_M) — practical floor is 4× 24 GB or 96 GB+ unified
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).
When not to use: Anything under 96 GB unified or 80 GB discrete — at Q4 it needs ~72 GB on disk plus KV. Local picks under 70 GB should stick with Llama 3.3 70B, Qwen 3.5 122B-A10B (MoE, smaller active), or gpt-oss-120b. Also: revenue threshold in the modified MIT license matters for commercial deployments — read before shipping.
Runner notes
Hosted on `mistralai/Mistral-Medium-3.5-128B`. API pricing $1.50 in / $7.50 out per 1M. Local: 4× 24 GB GPUs (4× RTX 4090 / 5090) is the realistic floor; 96 GB+ unified Mac Studio M3 Ultra works at Q4_K_M. Ollama / llama.cpp paths via community quants. Vibe Remote Agents (May 3, 2026) launched alongside it for cloud-agent workflows.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Minisforum UM890 ProRequires tweak · 1.0× 32 GB DDR5 (shared) · $463–$580 all-in
- Dual RTX 3090 (used)Perfect · 1.8× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 3.2× 128 GB unified · $1,999–$2,851
- M5 Pro MacBook Pro 48 GBGood · 1.2× 48 GB unified · $2,599–$3,099
- Mac Studio M4 Max 64 GBPerfect · 1.6× 64 GB unified · $3,199
- NVIDIA RTX A6000 (48 GB, used)Perfect · 1.8× 48 GB ECC · $3,500–$4,500
- NVIDIA RTX 5090Good · 1.2× 32 GB · $3,800–$4,100
- Mac Studio M3 Ultra 96 GBPerfect · 2.4× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBPerfect · 1.6× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 3.2× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 2.4× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→