MODEL · QWEN (ALIBABA) · 1.7B AND 0.6B (BUILT ON THE QWEN3-OMNI AUDIO STACK)
Qwen3-ASR (1.7B / 0.6B)
Qwen’s first dedicated open-weight ASR family — language identification plus speech recognition across 52 languages and dialects (30 languages + 22 Chinese dialects), built on the Qwen3-Omni audio foundation. Qwen claims the 1.7B is state-of-the-art among open-source ASR and competitive with the strongest proprietary commercial APIs. Apache 2.0, transformers-native, and small enough to run on CPU or any consumer GPU.
License: Apache 2.0 · Context: n/a · Released: June 26, 2026
The decision in five lines
- The call
- Consider — runnable locally, family reference
- Best for
- Local evaluation and family reference
- Runs on
- 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
- Watch out
- Speaker diarization or per-word timestamps — pair it with WhisperX/pyannote, or stay on the Canary-Qwen pipeline.
- Evidence
- Estimated
- 1.7B and 0.6B (built on the Qwen3-Omni audio stack)
- PARAMETERS
- STT / ASR
- TYPE
- —
- CONTEXT
- ~4 GB (1.7B) / ~1.5–2 GB (0.6B) at fp16
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
Qwen’s first dedicated open-weight ASR family — language identification plus speech recognition across 52 languages and dialects (30 languages + 22 Chinese dialects), built on the Qwen3-Omni audio foundation. Qwen claims the 1.7B is state-of-the-art among open-source ASR and competitive with the strongest proprietary commercial APIs. Apache 2.0, transformers-native, and small enough to run on CPU or any consumer GPU.
When not to use: Speaker diarization or per-word timestamps — pair it with WhisperX/pyannote, or stay on the Canary-Qwen pipeline. This is recognition only, not TTS.
Runner notes
transformers-native (`Qwen/Qwen3-ASR-1.7B-hf` / `Qwen3-ASR-0.6B-hf`); also runs under vLLM/SGLang. Day-one Apache weights — pick the 0.6B for edge/CPU, the 1.7B for accuracy. Verify WER on your own audio before swapping a production STT pipeline.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Intel Arc B580 12 GBPerfect · 4.7× 12 GB · $249–$299
- NVIDIA RTX 3060 12 GBPerfect · 4.7× 12 GB · $280–$400
- Minisforum UM890 ProPerfect · 9.4× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBPerfect · 6.3× 16 GB · $560–$610
- AMD Radeon RX 9070 XTPerfect · 6.3× 16 GB · $649–$779
- Mac Mini M4 16 GBPerfect · 4.2× 16 GB unified · $799 (new floor) / $499–$599 (eBay/residuals)
- AMD Radeon RX 7900 XTXPerfect · 9.4× 24 GB · $810 used / ~$1,340 new
- NVIDIA RTX 3090 (used, single)Perfect · 9.4× 24 GB · $950–$1,200
- NVIDIA RTX 5070 TiPerfect · 6.3× 16 GB · $980–$1,300
- NVIDIA RTX 5080Perfect · 6.3× 16 GB · $999–$1,400
- MacBook Air M5 24 GBPerfect · 6.3× 24 GB unified · $1,499–$1,899
- Mac Mini M4 Pro 24 GBPerfect · 6.3× 24 GB unified · $1,599
- Dual RTX 3090 (used)Perfect · 18.8× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 33.5× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 9.4× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 12.6× 48 GB unified · $2,999–$3,599
- NVIDIA RTX 5090Perfect · 12.5× 32 GB · $3,500–$4,300
- NVIDIA RTX A6000 (48 GB, used)Perfect · 18.8× 48 GB ECC · $3,500–$4,500
- Mac Studio M4 Max 64 GBPerfect · 16.8× 64 GB unified · $3,799
- NVIDIA DGX SparkPerfect · 33.5× 128 GB unified · $4,699
- M5 Max MacBook Pro 64 GBPerfect · 16.8× 64 GB unified · ~$5,199 (est.; June 25 2026 increase)
- Mac Studio M3 Ultra 96 GBPerfect · 25.2× 96 GB unified · $5,299
- Dual RTX 5090Perfect · 25.0× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→