MODEL · OPENBMB · 1.08B TOTAL / 679M NON-EMBEDDING (LLAMAFORCAUSALLM)
MiniCPM5-1B (Apache 2.0, OPD-trained)
OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.
License: Apache 2.0 · Context: 131,072 (128K) · Released: May 22, 2026
The decision in five lines
- The call
- Consider — runnable locally, family reference
- Best for
- Local evaluation and family reference
- Runs on
- 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
- Watch out
- General-purpose chat at any size larger than 4B — Qwen 3.5 4B, Phi-4 Mini, and Gemma 3 4B all win on absolute capability when you can fit them.
- Evidence
- Estimated
- 1.08B total
- PARAMETERS
- DENSE LLM
- TYPE
- 131072
- CONTEXT
- ~0.7–1 GB (Q4) / ~2.2 GB (FP16)
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.
When not to use: General-purpose chat at any size larger than 4B — Qwen 3.5 4B, Phi-4 Mini, and Gemma 3 4B all win on absolute capability when you can fit them. MiniCPM5-1B wins on "smallest credible model that thinks and calls tools."
Runner notes
Standard `LlamaForCausalLM` weights — llama.cpp / Ollama / vLLM paths all work day-one. SGLang is the recommended backend for tool calling. Sibling SFT-only checkpoint at `OpenBMB/MiniCPM5-1B-SFT` for ablation. Recommended sampling: think mode `temperature=0.9, top_p=0.95, enable_thinking=True`; no-think `temperature=0.7, top_p=0.95`. Sits parallel to BitCPM4-CANN family — both OpenBMB sub-2B research milestones, but different angles (BitCPM4 is native ternary architecture, MiniCPM5 is OPD post-training).
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Intel Arc B580 12 GBPerfect · 6.6× 12 GB · $249–$299
- NVIDIA RTX 3060 12 GBPerfect · 6.6× 12 GB · $250–$340
- Minisforum UM890 ProPerfect · 13.2× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBPerfect · 8.8× 16 GB · $560–$610
- AMD Radeon RX 9070 XTPerfect · 8.8× 16 GB · $649–$779
- AMD Radeon RX 7900 XTXPerfect · 13.2× 24 GB · $760 used / ~$1,500 new
- Mac Mini M4 16 GBPerfect · 5.9× 16 GB unified · $799 (new floor) / $499–$599 (eBay/residuals)
- NVIDIA RTX 3090 (used, single)Perfect · 13.2× 24 GB · $950–$1,200
- NVIDIA RTX 5070 TiPerfect · 8.8× 16 GB · $980–$1,300
- NVIDIA RTX 5080Perfect · 8.8× 16 GB · $999–$1,400
- MacBook Air M5 24 GBPerfect · 8.8× 24 GB unified · $1,299–$1,699
- Mac Mini M4 Pro 24 GBPerfect · 8.8× 24 GB unified · $1,399
- Dual RTX 3090 (used)Perfect · 26.3× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 47.0× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 13.2× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 17.6× 48 GB unified · $2,599–$3,099
- NVIDIA RTX 5090Perfect · 17.5× 32 GB · $2,910–$4,300
- Mac Studio M4 Max 64 GBPerfect · 23.5× 64 GB unified · $3,199
- NVIDIA RTX A6000 (48 GB, used)Perfect · 26.3× 48 GB ECC · $3,500–$4,500
- Mac Studio M3 Ultra 96 GBPerfect · 35.2× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBPerfect · 23.5× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 47.0× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 35.1× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→