the AI bench
VERIFIED MAY 2026
All models

MODEL · OPENBMB · 1.08B TOTAL / 679M NON-EMBEDDING (LLAMAFORCAUSALLM)

MiniCPM5-1B (Apache 2.0, OPD-trained)

OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.

License: Apache 2.0 · Context: 131,072 (128K) · Released: May 22, 2026

The decision in five lines

The call
Consider — runnable locally, family reference
Best for
Local evaluation and family reference
Runs on
23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out
General-purpose chat at any size larger than 4B — Qwen 3.5 4B, Phi-4 Mini, and Gemma 3 4B all win on absolute capability when you can fit them.
Evidence
Estimated · last verified May 2026

1.08B total
PARAMETERS
DENSE LLM
TYPE
131072
CONTEXT
~0.7–1 GB (Q4) / ~2.2 GB (FP16)
VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.

When not to use: General-purpose chat at any size larger than 4B — Qwen 3.5 4B, Phi-4 Mini, and Gemma 3 4B all win on absolute capability when you can fit them. MiniCPM5-1B wins on "smallest credible model that thinks and calls tools."

Runner notes

Standard `LlamaForCausalLM` weights — llama.cpp / Ollama / vLLM paths all work day-one. SGLang is the recommended backend for tool calling. Sibling SFT-only checkpoint at `OpenBMB/MiniCPM5-1B-SFT` for ablation. Recommended sampling: think mode `temperature=0.9, top_p=0.95, enable_thinking=True`; no-think `temperature=0.7, top_p=0.95`. Sits parallel to BitCPM4-CANN family — both OpenBMB sub-2B research milestones, but different angles (BitCPM4 is native ternary architecture, MiniCPM5 is OPD post-training).

License
Apache 2.0
Released
May 22, 2026
Maker
OpenBMB

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this