the AI bench
VERIFIED JUNE 2026
All models

MODEL · NVIDIA NEMO + M-BAIN (PIPELINE) · 2.5B (CANARY) + 1.5B (WHISPERX / WHISPER-LARGE-V3)

Canary-Qwen 2.5B + WhisperX

Canary-Qwen is an English-only ASR that doubles as a 2.5B LLM over its own transcripts — transcribe, then summarize/Q&A. WhisperX adds word-level timestamps + diarization. The near-frontier English-first pipeline.

License: Canary: CC-BY-4.0 · WhisperX: BSD-4 · Context: n/a · Released: July 2025 (Canary-Qwen)

The decision in five lines

The call
Buy — for voice
Best for
voice
Runs on
23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out
Non-English audio — Canary-Qwen is English-only.
Evidence
Estimated · last verified April 2026

2.5B (Canary) + 1.5B (WhisperX
PARAMETERS
ENGLISH ASR + DIARIZATION PIPELINE
TYPE
CONTEXT
~6 GB combined
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

VOICE · TOP
Canary-Qwen 2.5B + WhisperXCanary tops HF Open ASR (5.63% WER English) paired with WhisperX for word-level timestamps + diarization.

The call

Canary-Qwen is an English-only ASR that doubles as a 2.5B LLM over its own transcripts — transcribe, then summarize/Q&A. WhisperX adds word-level timestamps + diarization. The near-frontier English-first pipeline.

When not to use: Non-English audio — Canary-Qwen is English-only. For multilingual, stick to Parakeet-TDT or faster-whisper.

Runner notes

NeMo toolkit for Canary-Qwen, `whisperx` Python for the alignment + diarization stage. Budget ~2 h for first-time install.

License
Canary: CC-BY-4.0 · WhisperX: BSD-4
Released
July 2025 (Canary-Qwen)
Maker
NVIDIA NeMo + m-bain (pipeline)

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this