MiniCPM5-1B — OpenBMB's On-Policy Distillation pipeline lands at 1B

OpenBMB uploaded MiniCPM5-1B and MiniCPM5-1B-SFT to Hugging Face on May 22 — a 1.08B-parameter dense Llama-class model trained with a three-stage SFT → RL → On-Policy Distillation pipeline. Apache 2.0, 128K context, English + Chinese, hybrid `<think>` reasoning toggle, native XML-style tool calling. Claims 1B-class open-source SOTA against LFM2.5-1.2B-Thinking, Qwen3-0.6B/think, and Qwen3.5-0.8B/think.

Verdict: OPD-trained 1B SOTA — the training method matters more than the size

The take

The training pipeline is the editorial story. Most sub-2B open models are direct SFT from a base checkpoint — sometimes with a light DPO pass. MiniCPM5-1B does SFT → RL → On-Policy Distillation: domain-specific RL teachers (math, code, closed-book QA, writing) are trained, then distilled back into one release model using OPD (the technique formalized by Thinking Machines Lab, with implementation tweaks per arxiv 2604.13016). The headline gain over the SFT-only baseline: +16 point average on math / code / instruction-following, with 29 percentage-point drop in responses that hit the max-token budget. The SFT-only sibling checkpoint at `OpenBMB/MiniCPM5-1B-SFT` is published explicitly so users can ablate the RL+OPD effect — that's an honest research-disclosure pattern.

What you actually get: 1.08B total params (679M non-embedding), 128K context, hybrid `<think>` toggle via `enable_thinking`, native XML-style tool calling (SGLang is the recommended backend; llama.cpp / Ollama / vLLM all work via the standard LlamaForCausalLM weights). Recommended sampling: think mode at `temperature=0.9, top_p=0.95`; no-think at `temperature=0.7`. Disk footprint is ~700 MB to 1 GB at Q4 — fits a Raspberry Pi-class accelerator with room to spare.

Where it fits in our taxonomy: detail-page only, no planner pick slot. The existing low-tier picks (Qwen 3.5 4B, Qwen 3.5 2B, Phi-4 Mini, Gemma 3 4B) all win on absolute capability when you have 4 GB or more to spare. MiniCPM5-1B is for the narrow band where you genuinely need the smallest model that thinks and calls tools — embedded targets, agent fan-out at high concurrency, the leading edge of "can this run on the watch."

How this lines up with the rest of the OpenBMB stack: BitCPM4-CANN (May 2026, native ternary, 0.5B–8B) is the architectural-research line; MiniCPM-V-4.6 (1B vision-language, May 15) is the vision branch; MiniCPM-o 2.6 (8B omnimodal) is the multimodal flagship; MiniCPM5-1B is now the post-training-method showcase. Three distinct research threads at OpenBMB, all shipping in May.

Where this fits

Models: MiniCPM5-1B (Apache 2.0, OPD-trained) · BitCPM4-CANN family (0.5B / 1B / 3B / 8B, native 1.58-bit) · MiniCPM-V-4.6 (1B vision-language) · Phi-4 Mini · Qwen 3.5 2B

Hardware: NVIDIA RTX 3060 12 GB · Mac Mini M4 16 GB · Minisforum UM890 Pro

Sources

Next step

Try this in the planner→