faster-whisper large-v3-turbo

CTranslate2 reimplementation of OpenAI Whisper — 4× faster with int8 quantization, matches reference accuracy. The practical STT default.

License: MIT (library + weights) · Context: n/a · Released: Turbo support October 2024

The decision in five lines

The call: Skip for local — for voice
Best for: voice
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: Speaker diarization or per-word timestamps at scale — WhisperX wraps faster-whisper with alignment + pyannote for that.
Evidence: Estimated · last verified July 2026

~809M (Whisper large-v3-turbo distilled): PARAMETERS
STT: TYPE
—: CONTEXT
~1.5–2 GB (int8): VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

VOICE · LOW

faster-whisper large-v3-turbo (int8)MIT; 99 languages; 4× faster than vanilla Whisper; the STT default at this tier.

The call

CTranslate2 reimplementation of OpenAI Whisper — 4× faster with int8 quantization, matches reference accuracy. The practical STT default.
When not to use: Speaker diarization or per-word timestamps at scale — WhisperX wraps faster-whisper with alignment + pyannote for that.

Runner notes

`pip install faster-whisper`. `WhisperModel("large-v3-turbo", compute_type="int8")` for CPU, `"int8_float16"` for GPU. Runs reasonably on ARM / Apple Silicon via CTranslate2.

License: MIT (library + weights)
Released: Turbo support October 2024
Maker: SYSTRAN
Model card: huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→