MODEL · OPENBMB · 2B (DIFFUSION-AUTOREGRESSIVE, TOKENIZER-FREE; MINICPM-4 BACKBONE)
VoxCPM2 (2B)
Apache 2.0 TTS with 48 kHz output, short-clip zero-shot voice cloning, and natural-language "voice design" (describe a voice, get one — no reference audio required) across 30 languages.
License: Apache 2.0 · Context: n/a · Released: April 2026
The decision in five lines
- The call
- Buy — for voice
- Best for
- voice
- Runs on
- 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
- Watch out
- Strictly English narration on minimal hardware — Kokoro-82M is 25× smaller and equally good for that case.
- Evidence
- Estimated
- 2B (diffusion-autoregressive, tokenizer-free; MiniCPM-4 backbone)
- PARAMETERS
- TTS + VOICE CLONE + VOICE DESIGN
- TYPE
- —
- CONTEXT
- ~6–8 GB
- VRAM AT Q4
Where we recommend this
Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.
The call
Apache 2.0 TTS with 48 kHz output, short-clip zero-shot voice cloning, and natural-language "voice design" (describe a voice, get one — no reference audio required) across 30 languages.
When not to use: Strictly English narration on minimal hardware — Kokoro-82M is 25× smaller and equally good for that case.
Runner notes
GitHub `OpenBMB/VoxCPM` (single repo, 18.9k stars) covers all three family members. No llama.cpp/Ollama route yet (non-LM architecture). Step-down options: `VoxCPM1.5` (0.6B, 44.1 kHz, January 2026) for mid-VRAM; `VoxCPM-0.5B` (0.5B, 16 kHz, EN/ZH only, September 2025) for low-VRAM. Primary references: HF model card, GitHub repo, arxiv 2509.24650 technical report.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Intel Arc B580 12 GBPerfect · 1.7× 12 GB · $249–$299
- NVIDIA RTX 3060 12 GBPerfect · 1.7× 12 GB · $280–$400
- Minisforum UM890 ProPerfect · 3.3× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBPerfect · 2.2× 16 GB · $560–$610
- AMD Radeon RX 9070 XTPerfect · 2.2× 16 GB · $649–$779
- AMD Radeon RX 7900 XTXPerfect · 3.3× 24 GB · $760 used / ~$1,500 new
- Mac Mini M4 16 GBPerfect · 1.5× 16 GB unified · $799 (new floor) / $499–$599 (eBay/residuals)
- NVIDIA RTX 3090 (used, single)Perfect · 3.3× 24 GB · $950–$1,200
- NVIDIA RTX 5070 TiPerfect · 2.2× 16 GB · $980–$1,300
- NVIDIA RTX 5080Perfect · 2.2× 16 GB · $999–$1,400
- MacBook Air M5 24 GBPerfect · 2.2× 24 GB unified · $1,299–$1,699
- Mac Mini M4 Pro 24 GBPerfect · 2.2× 24 GB unified · $1,399
- Dual RTX 3090 (used)Perfect · 6.6× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 11.9× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 3.3× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 4.5× 48 GB unified · $2,599–$3,099
- NVIDIA RTX 5090Perfect · 4.4× 32 GB · $2,910–$4,300
- Mac Studio M4 Max 64 GBPerfect · 5.9× 64 GB unified · $3,199
- NVIDIA RTX A6000 (48 GB, used)Perfect · 6.6× 48 GB ECC · $3,500–$4,500
- Mac Studio M3 Ultra 96 GBPerfect · 8.9× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBPerfect · 5.9× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 11.9× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 8.9× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→