the AI bench
VERIFIED APRIL 2026
All models

MODEL · MISTRAL AI · 128B DENSE (FOLDS MAGISTRAL REASONING + DEVSTRAL 2 CODING INTO ONE WEIGHT SET)

Mistral Medium 3.5 128B

Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).

License: Modified MIT (commercial OK below revenue threshold; non-commercial above) · Context: 256K · Released: April 29, 2026

The decision in five lines

The call
Consider — runnable locally, family reference
Best for
Local evaluation and family reference
Runs on
11 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out
Anything under 96 GB unified or 80 GB discrete — at Q4 it needs ~72 GB on disk plus KV.
Evidence
Estimated · last verified May 2026

128B dense (folds Magistral reasoning + Devstral 2 coding into one weight set)
PARAMETERS
DENSE
TYPE
256K
CONTEXT
~72 GB (Q4_K_M) — practical floor is 4× 24 GB or 96 GB+ unified
VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).

When not to use: Anything under 96 GB unified or 80 GB discrete — at Q4 it needs ~72 GB on disk plus KV. Local picks under 70 GB should stick with Llama 3.3 70B, Qwen 3.5 122B-A10B (MoE, smaller active), or gpt-oss-120b. Also: revenue threshold in the modified MIT license matters for commercial deployments — read before shipping.

Runner notes

Hosted on `mistralai/Mistral-Medium-3.5-128B`. API pricing $1.50 in / $7.50 out per 1M. Local: 4× 24 GB GPUs (4× RTX 4090 / 5090) is the realistic floor; 96 GB+ unified Mac Studio M3 Ultra works at Q4_K_M. Ollama / llama.cpp paths via community quants. Vibe Remote Agents (May 3, 2026) launched alongside it for cloud-agent workflows.

License
Modified MIT (commercial OK below revenue threshold; non-commercial above)
Released
April 29, 2026
Maker
Mistral AI

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this