the AI bench
VERIFIED JUNE 2026
All hardware

HARDWARE · FANLESS PORTABLE · 24 GB UNIFIED

MacBook Air M5 24 GB

Fanless, silent, and barely enough bandwidth to feel interactive.

The M5 Air at 24 GB is the first Apple laptop where 8B dense Q4 inference feels responsive without a fan ramping up — because there is no fan. 153 GB/s bandwidth is the honest limiting factor; this is not a 14B-comfortable machine.

The decision in five lines

The call
Consider — Fanless, silent, and barely enough bandwidth to feel interactive.
Best for
Fanless portable
Runs well
Qwen3-14B · Qwen 3.5 9B · Qwen 3.5 9B + RAG
Watch out
Fanless means sustained inference throttles. Expect ~15–20% speed loss after 5–10 minutes of continuous token generation.
Evidence
Estimated · last verified June 2026

24
GB UNIFIED
153
GB/S BANDWIDTH
FANLESS
SUSTAINED THROTTLE
$1,299
13" FROM APPLE

What fits at this tier

Fits 8B dense Q4 at ~25–35 tok/s, 4B-class picks at 60–75 tok/s. 14B dense Q4 fits (weights at 8 GB on 24 GB unified) but 153 GB/s bandwidth caps TG at ~12–18 tok/s — workable, not comfortable. 30B-A3B MoE Q4 is a REQUIRES TWEAK fit at the memory margin; bandwidth makes it slower than the Mac mini M4 Pro.

CODING
Qwen3-14B Sticky 14B workhorse; 128K context; Apache 2.0; broad runner support.
CHAT / GENERAL
Qwen 3.5 9B 262K context with native multimodal; strong on GPQA, IFEval, LiveCodeBench at the 9B size.
DOCS & RETRIEVAL
Qwen 3.5 9B + RAG Chunk aggressively, retrieve well; 262K native context handles big retrieval windows comfortably.
IMAGE
FLUX.2 klein 4B (Apache 2.0) BFL's first fully Apache-2.0 model; 4B distilled for fast inference on mid-tier GPUs; commercial OK.
AGENTS
Qwen 3.5 9B Strong tool-use performance for 9B; supports thinking mode and 201-language coverage.
VOICE
Chatterbox Multilingual (Resemble AI) MIT; 23 languages; voice cloning + emotion dial; pip 0.1.7 (March 2026) shows active development.

The call

Buy it if portability + silent operation matters more than raw throughput — you want local 8B on a plane, in a library, or next to a sleeping kid. The M5 Neural Accelerators do help prefill vs M4 generation.

Skip it if you'll plug it in and use it as a desk machine — a Mac mini M4 Pro at $1,399 gives 273 GB/s (80% more bandwidth) in a smaller footprint that stays on your desk.

Watchouts

  • Fanless means sustained inference throttles. Expect ~15–20% speed loss after 5–10 minutes of continuous token generation.
  • Ollama 0.21+ MLX backend requires 32 GB+ unified. 24 GB MBA users stay on the Metal path.
  • 153 GB/s is the hard bandwidth ceiling. No sysctl tweak helps — memory bandwidth is not memory capacity.
  • macOS 33% reservation caps the LLM budget at ~16 GB. With sysctl tweak, ~21 GB.

Local vs cloud at this tier

● LOCAL WINS

Airplane-mode local inference. Privacy-first work from a cafe. Local chat + embeddings + summarization that doesn't need a network.

● CLOUD WINS

Anything interactive at 14B+. A portable laptop on wifi running Claude Pro beats this on quality for chat work, at $20/mo.

A sensible secondary inference device, not a primary one. The value is "local AI that travels," not raw throughput.

Next step

Load this setup into the planner