HARDWARE · FANLESS PORTABLE · 24 GB UNIFIED
MacBook Air M5 24 GB
Fanless, silent, and barely enough bandwidth to feel interactive.
The M5 Air at 24 GB is the first Apple laptop where 8B dense Q4 inference feels responsive without a fan ramping up — because there is no fan. 153 GB/s bandwidth is the honest limiting factor; this is not a 14B-comfortable machine.
The decision in five lines
- The call
- Consider — Fanless, silent, and barely enough bandwidth to feel interactive.
- Best for
- Fanless portable
- Runs well
- Qwen3-14B · Qwen 3.5 9B · Qwen 3.5 9B + RAG
- Watch out
- Fanless means sustained inference throttles. Expect ~15–20% speed loss after 5–10 minutes of continuous token generation.
- Evidence
- Estimated
- 24
- GB UNIFIED
- 153
- GB/S BANDWIDTH
- FANLESS
- SUSTAINED THROTTLE
- $1,299
- 13" FROM APPLE
What fits at this tier
Fits 8B dense Q4 at ~25–35 tok/s, 4B-class picks at 60–75 tok/s. 14B dense Q4 fits (weights at 8 GB on 24 GB unified) but 153 GB/s bandwidth caps TG at ~12–18 tok/s — workable, not comfortable. 30B-A3B MoE Q4 is a REQUIRES TWEAK fit at the memory margin; bandwidth makes it slower than the Mac mini M4 Pro.
The call
Buy it if portability + silent operation matters more than raw throughput — you want local 8B on a plane, in a library, or next to a sleeping kid. The M5 Neural Accelerators do help prefill vs M4 generation.
Skip it if you'll plug it in and use it as a desk machine — a Mac mini M4 Pro at $1,399 gives 273 GB/s (80% more bandwidth) in a smaller footprint that stays on your desk.
Watchouts
- Fanless means sustained inference throttles. Expect ~15–20% speed loss after 5–10 minutes of continuous token generation.
- Ollama 0.21+ MLX backend requires 32 GB+ unified. 24 GB MBA users stay on the Metal path.
- 153 GB/s is the hard bandwidth ceiling. No sysctl tweak helps — memory bandwidth is not memory capacity.
- macOS 33% reservation caps the LLM budget at ~16 GB. With sysctl tweak, ~21 GB.
Local vs cloud at this tier
● LOCAL WINS
Airplane-mode local inference. Privacy-first work from a cafe. Local chat + embeddings + summarization that doesn't need a network.
● CLOUD WINS
Anything interactive at 14B+. A portable laptop on wifi running Claude Pro beats this on quality for chat work, at $20/mo.
A sensible secondary inference device, not a primary one. The value is "local AI that travels," not raw throughput.
Next step
Load this setup into the planner→