the AI bench
VERIFIED JUNE 2026
All models

MODEL · ALIBABA · 30B TOTAL / 3B ACTIVE

Qwen3-Omni-30B-A3B-Instruct

The only locally-runnable open-weight model that does real-time streaming speech-out natively. 119 input languages, 10 speech-output languages (two voices: Chelsie, Ethan).

License: Apache 2.0 · Context: Multimodal tokens dominate the budget · Released: September 22, 2025

The decision in five lines

The call
Buy — for voice
Best for
voice
Runs on
16 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out
Ollama-only workflows — Ollama has no native voice pipeline, so you can't use this model's speech-out through it at all.
Evidence
Estimated · last verified April 2026

30B total
PARAMETERS
MOE
TYPE
Multimodal
CONTEXT
~16 GB (AWQ 4-bit, full stack)
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

VOICE · TOP
Qwen3-Omni-30B-A3B-InstructApache 2.0 MoE; audio+video+image+text in, speech+text out; 17GB at Q4. Frontier unified voice.

The call

The only locally-runnable open-weight model that does real-time streaming speech-out natively. 119 input languages, 10 speech-output languages (two voices: Chelsie, Ethan).

When not to use: Ollama-only workflows — Ollama has no native voice pipeline, so you can't use this model's speech-out through it at all.

Runner notes

Serve via vLLM or the QwenLM/Qwen3-Omni reference server behind Open-WebUI. Route text-only traffic separately if you don't need audio. AWQ-4bit (`cpatonn/Qwen3-Omni-30B-A3B-Instruct-AWQ-4bit`) fits ~16 GB VRAM.

License
Apache 2.0
Released
September 22, 2025
Maker
Alibaba

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this