Step-Audio 2 mini

StepFun's 8B speech-to-speech LALM trained on 8M+ hours of audio. Competitive with GPT-4o-audio on speech recognition + S2S translation benchmarks, fully open-source weights.

License: Apache 2.0 · Context: n/a · Released: August 29, 2025

The decision in five lines

The call: Buy — for voice
Best for: voice
Runs on: 16 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out: Streaming real-time chat on consumer hardware — 8B LALM is heavy.
Evidence: Estimated · last verified July 2026

8B (LALM): PARAMETERS
END-TO-END SPEECH-TO-SPEECH: TYPE
—: CONTEXT
~16 GB (FP16): VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

VOICE ·

Step-Audio 2 mini FP16 (~16 GB)StepFun 8B end-to-end speech-to-speech LALM. Competitive with GPT-4o-audio on benchmarks. Apache 2.0; full FP16 retains audio nuance.

VOICE · HIGH

Step-Audio 2 mini (8B, Apache 2.0)Unified speech-to-speech; competitive with GPT-4o-audio on several benchmarks; ~16GB FP16.

The call

StepFun's 8B speech-to-speech LALM trained on 8M+ hours of audio. Competitive with GPT-4o-audio on speech recognition + S2S translation benchmarks, fully open-source weights.
When not to use: Streaming real-time chat on consumer hardware — 8B LALM is heavy. MiniCPM-o 2.6 int4 is lighter for voice-in-voice-out on 8 GB VRAM.

Runner notes

GitHub `stepfun-ai/Step-Audio2` bundled inference scripts. No Ollama route yet. The mini-Think variant (September 2025) adds reasoning-trace output; the full 30B model promised on the StepFun roadmap was still unreleased as of late April 2026.

License: Apache 2.0
Released: August 29, 2025
Maker: StepFun
Model card: huggingface.co/stepfun-ai/Step-Audio-2-mini →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→