MOSS-TTS-Nano — multilingual voice cloning runs on 4 CPU cores

A 100M Apache-2.0 model that fills the gap Kokoro-82M doesn't cover — multilingual TTS with voice cloning from a short reference audio, real-time on 4 CPU cores. The first time those three properties co-existed in an open-weight pick.

Verdict: 100M Apache 2.0 multilingual TTS — first time voice cloning runs on CPU

The take

OpenMOSS / MOSI.AI released MOSS-TTS-Nano on April 10, 2026 (PyTorch path) with the ONNX-CPU port following April 17. 0.1B parameters, Apache 2.0 license, 20 languages including English, Mandarin, Japanese, Korean, Spanish, French, Arabic. 48 kHz stereo output. Voice cloning from a short audio reference. Neural-audio-tokenizer + autoregressive LLM pipeline.

Until this drop, the open-weight TTS landscape forced a tradeoff: Kokoro-82M for tiny CPU-real-time but English-first, no cloning. Chatterbox or VoxCPM2 for cloning, but ~4–8 GB VRAM minimum. MOSS-TTS-Nano collapses that tradeoff — 100M params runs real-time on 4 CPU cores, multilingual coverage, and voice cloning from short reference audio. All three at once was new.

The ONNX build (April 17) is the practical path — drops PyTorch entirely, gets ~2× the inference efficiency of the original. HuggingFace `OpenMOSS-Team/MOSS-TTS-Nano-100M-ONNX` for the CPU-friendly route, with companion `MOSS-Audio-Tokenizer-Nano-ONNX`. Sibling MOSS-TTSD-v0.5 (2B, ZH/EN dialogue) covers expressive multi-speaker dialogue if you outgrow Nano's narration scope.

Where it sits in the planner: voice.low alongside Kokoro-82M (English narration), faster-whisper (STT), and MiniCPM-o (multimodal voice). Practical pick: MOSS-TTS-Nano if you need multilingual or cloning, Kokoro if you only need English narration. The two cover the entry tier comprehensively now in a way they didn't a month ago.

Where this fits

Models: MOSS-TTS-Nano (100M) · Kokoro-82M · VoxCPM2 (2B)

Hardware: NVIDIA RTX 3060 12 GB · Mac Mini M4 16 GB

Sources

Next step

Try this in the planner→