MiniCPM5-1B (Apache 2.0, OPD-trained)

OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.

License: Apache 2.0 · Context: 131,072 (128K) · Released: May 22, 2026

The decision in five lines

The call: Consider — runnable locally, family reference
Best for: Local evaluation and family reference
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: General-purpose chat at any size larger than 4B — Qwen 3.5 4B, Phi-4 Mini, and Gemma 3 4B all win on absolute capability when you can fit them.
Evidence: Estimated · last verified July 2026

1.08B total: PARAMETERS
DENSE LLM: TYPE
131072: CONTEXT
~0.7–1 GB (Q4) / ~2.2 GB (FP16): VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.
When not to use: General-purpose chat at any size larger than 4B — Qwen 3.5 4B, Phi-4 Mini, and Gemma 3 4B all win on absolute capability when you can fit them. MiniCPM5-1B wins on "smallest credible model that thinks and calls tools."

Runner notes

Standard `LlamaForCausalLM` weights — llama.cpp / Ollama / vLLM paths all work day-one. SGLang is the recommended backend for tool calling. Sibling SFT-only checkpoint at `OpenBMB/MiniCPM5-1B-SFT` for ablation. Recommended sampling: think mode `temperature=0.9, top_p=0.95, enable_thinking=True`; no-think `temperature=0.7, top_p=0.95`. Sits parallel to BitCPM4-CANN family — both OpenBMB sub-2B research milestones, but different angles (BitCPM4 is native ternary architecture, MiniCPM5 is OPD post-training).

License: Apache 2.0
Released: May 22, 2026
Maker: OpenBMB
Model card: huggingface.co/openbmb/MiniCPM5-1B →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→