the AI bench
VERIFIED MAY 2026
All models

MODEL · OPENMOSS / MOSI.AI · ~9B TOTAL (8B LLM + AUDIO ENCODER)

MOSS-Music-8B (Instruct + Thinking)

The first open-weight music-understanding LLM worth flagging — does lyrics ASR with time-aligned transcription, musical captioning, key/tempo/chord reasoning, structural analysis (intro/verse/chorus/bridge/outro), instrument + voice recognition, and music QA. Audio encoder runs at 12.5 Hz temporal resolution. 80.38% avg accuracy across 8 music-QA benchmarks; 15.88% avg WER/CER on lyrics; 4.36/5.0 MusicCaps captioning. Thinking variant adds chain-of-thought reasoning over audio.

License: Apache 2.0 · Context: inherits 8B backbone · Released: May 1, 2026

The decision in five lines

The call
Consider — runnable locally, family reference
Best for
Local evaluation and family reference
Runs on
23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out
You want music GENERATION — this is understanding-only (audio in, text out).
Evidence
Estimated · last verified May 2026

~9B total (8B LLM + audio encoder)
PARAMETERS
MUSIC UNDERSTANDING
TYPE
inherits
CONTEXT
~6–8 GB (Q4) / ~18 GB (FP16, est.)
VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

The first open-weight music-understanding LLM worth flagging — does lyrics ASR with time-aligned transcription, musical captioning, key/tempo/chord reasoning, structural analysis (intro/verse/chorus/bridge/outro), instrument + voice recognition, and music QA. Audio encoder runs at 12.5 Hz temporal resolution. 80.38% avg accuracy across 8 music-QA benchmarks; 15.88% avg WER/CER on lyrics; 4.36/5.0 MusicCaps captioning. Thinking variant adds chain-of-thought reasoning over audio.

When not to use: You want music GENERATION — this is understanding-only (audio in, text out). For text-to-music try Suno / Udio (hosted) or Stable Audio (open-weight, separate model). And it's English-leaning on lyrics ASR; non-Latin script lyrics may suffer.

Runner notes

Recommended runtime is SGLang Serving (`sglang serve --model-path ./weights/MOSS-Music-8B-Instruct --trust-remote-code`). PyTorch 2.9+cu128 + FlashAttention 2 + FFmpeg 7. Python 3.12. Gradio app for local UI. No GGUF / Ollama route yet (non-standard audio-LLM arch). Sibling MOSS-Music-8B-Thinking for reasoning over audio.

License
Apache 2.0
Released
May 1, 2026
Maker
OpenMOSS / MOSI.AI

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this