MODEL · OPENBMB · 0.5B / 1B / 3B / 8B (ALL NATIVELY TRAINED IN 1.58-BIT TERNARY; NOT POST-HOC QUANTIZED)
BitCPM4-CANN family (0.5B / 1B / 3B / 8B, native 1.58-bit)
First publicly reported end-to-end 1.58-bit (ternary {-1, 0, 1}) training stack at 8B scale. Trained natively at 1.58-bit via Quantization-Aware Training + Straight-Through Estimator on Huawei Ascend NPU — not a post-hoc PTQ pass over a BF16 model. The 8B model retains 95.7% of full-precision MiniCPM4 performance at ~6× memory reduction; 0.5B retains 90.1%. The new low-VRAM tier ceiling.
License: Apache 2.0 · Context: Inherits MiniCPM4 base (8K-32K depending on variant) · Released: May 2026
The decision in five lines
- The call
- Consider — runnable locally, family reference
- Best for
- Local evaluation and family reference
- Runs on
- 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
- Watch out
- Frontier reasoning — these are research-milestone weights, not the strongest model at their size.
- Evidence
- Estimated
- 0.5B
- PARAMETERS
- NATIVE TERNARY LLM
- TYPE
- Inherits
- CONTEXT
- ~0.1 GB (0.5B) / ~0.2 GB (1B) / ~0.6 GB (3B) / ~1.6 GB (8B) — these ARE the runtime storage, not a quant of something bigger
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
First publicly reported end-to-end 1.58-bit (ternary {-1, 0, 1}) training stack at 8B scale. Trained natively at 1.58-bit via Quantization-Aware Training + Straight-Through Estimator on Huawei Ascend NPU — not a post-hoc PTQ pass over a BF16 model. The 8B model retains 95.7% of full-precision MiniCPM4 performance at ~6× memory reduction; 0.5B retains 90.1%. The new low-VRAM tier ceiling.
When not to use: Frontier reasoning — these are research-milestone weights, not the strongest model at their size. For best-quality 8B work, Llama 3.1 8B or Ministral 3 8B still win on benchmarks. BitCPM4-CANN wins on $/byte of weight memory, not on absolute capability.
Runner notes
Models load as pseudo-quantized via standard PyTorch / Transformers — no special kernels needed for inference. Primary serving target is Huawei Ascend 910B/910C. GGUF builds available for llama.cpp. The 0.5B at ~100 MB on-disk is the smallest credible chat model in the open-weight landscape; useful for embedded targets and high-volume agent fan-out where every active model multiplies cost.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Intel Arc B580 12 GBPerfect · 8.9× 12 GB · $249–$299
- NVIDIA RTX 3060 12 GBPerfect · 8.9× 12 GB · $250–$340
- Minisforum UM890 ProPerfect · 17.8× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBPerfect · 11.9× 16 GB · $560–$610
- AMD Radeon RX 9070 XTPerfect · 11.9× 16 GB · $649–$779
- AMD Radeon RX 7900 XTXPerfect · 17.8× 24 GB · $760 used / ~$1,500 new
- Mac Mini M4 16 GBPerfect · 7.9× 16 GB unified · $799 (new floor) / $499–$599 (eBay/residuals)
- NVIDIA RTX 3090 (used, single)Perfect · 17.8× 24 GB · $950–$1,200
- NVIDIA RTX 5070 TiPerfect · 11.9× 16 GB · $980–$1,300
- NVIDIA RTX 5080Perfect · 11.9× 16 GB · $999–$1,400
- MacBook Air M5 24 GBPerfect · 11.9× 24 GB unified · $1,299–$1,699
- Mac Mini M4 Pro 24 GBPerfect · 11.9× 24 GB unified · $1,399
- Dual RTX 3090 (used)Perfect · 35.6× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 63.5× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 17.8× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 23.8× 48 GB unified · $2,599–$3,099
- NVIDIA RTX 5090Perfect · 23.7× 32 GB · $2,910–$4,300
- Mac Studio M4 Max 64 GBPerfect · 31.8× 64 GB unified · $3,199
- NVIDIA RTX A6000 (48 GB, used)Perfect · 35.6× 48 GB ECC · $3,500–$4,500
- Mac Studio M3 Ultra 96 GBPerfect · 47.6× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBPerfect · 31.8× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 63.5× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 47.4× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→