the AI bench
VERIFIED JUNE 2026
All hardware

HARDWARE · MAC · 64 GB UNIFIED

M5 Max MacBook Pro 64 GB

Silent, portable, and the only hardware here that doubles as your laptop.

64 GB of unified memory at 614 GB/s on the 40-core GPU M5 Max. Runs every modern model up to 35B-A3B MoE at reasonable speed, in a silent chassis that sustains load on battery. The compromise: prefill on long prompts is noticeably slower than NVIDIA, and you pay Apple's storage tax to go beyond 48 GB.

The decision in five lines

The call
Buy — Silent, portable, and the only hardware here that doubles as your laptop.
Best for
All-rounder · Mac
Runs well
Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.5 35B-A3B (MoE, fits 24GB) · Gemma 4 31B (256K context)
Watch out
Custom build required — the 32-core GPU M5 Max tops out at 36 GB; 64 GB is only available with the 40-core GPU bin, which starts at 48 GB. Specify 40-core GPU + 64 GB in Apple's configurator.
Evidence
Measured · last verified June 2026

64
GB UNIFIED
614
GB/S BANDWIDTH
~40
W SUSTAINED
~$4,499
CUSTOM BUILD

What fits at this tier

Runs 35B-A3B MoE, 27B dense, and 14B dense at useful speeds on MLX (Ollama's MLX backend or LM Studio). FLUX.2 klein fits with room to spare. Voice picks work but prefill on long Whisper transcripts is slow.

CODING
Qwen3-Coder-30B-A3B (MoE, fits 24GB) 3B-active MoE — benchmark champion for local coding at this tier.
CHAT / GENERAL
Qwen 3.5 35B-A3B (MoE, fits 24GB) 3B active MoE — 30B quality at 3B inference speed.
DOCS & RETRIEVAL
Gemma 4 31B (256K context) 31B dense with 256K context; Gemma commercial-permissive terms; Arena top 5.
IMAGE
HiDream-O1-Image (8B, MIT) May 8, 2026 release. Pixel-space (no VAE, no disjoint text encoder) — debuted top-10 on Artificial Analysis T2I Arena. MIT-licensed 8B; one model handles T2I + edit + subject-driven personalization at up to 2,048².
AGENTS
Qwen 3.5 35B-A3B (MoE, fits 24GB) MoE with native tool use; fits 24GB at Q4; Apache 2.0.
VOICE
VoxCPM2 (2B, Apache 2.0) 30 languages, 48 kHz, tokenizer-free diffusion AR; voice design from text. April 2026 release.

The call

Buy it if you want local AI that moves with you, runs silently, and works on battery — and if the rest of your stack is already Apple.

Skip it if you're running long-context coding or docs workloads (Mac prefill on 32K prompts can take 60–90s), or if you have room for a desktop GPU. You will get more raw throughput per dollar from a 5090 or dual-3090 build.

Watchouts

  • Custom build required — the 32-core GPU M5 Max tops out at 36 GB; 64 GB is only available with the 40-core GPU bin, which starts at 48 GB. Specify 40-core GPU + 64 GB in Apple's configurator.
  • Prefill latency is slow on long prompts. First token on a 32K prompt can take 60–90 seconds vs near-instant on NVIDIA. Throughput on short prompts is fine.
  • The 64 GB unified memory is split between macOS, apps, and model weights. Realistic model ceiling is ~48 GB after system overhead — plan for 35B-A3B Q4 comfortably, not 70B.
  • MLX is the right runner. Ollama's MLX backend shipped in Feb 2026 and matches LM Studio's auto-detection. Stick with Q4_K_M or MLX-native quants for the best throughput.

Local vs cloud at this tier

● LOCAL WINS

Silent, portable, fully offline. Long-term you keep the whole laptop, not just a GPU — that changes the economics if you were going to buy an M5 Max anyway.

● CLOUD WINS

Frontier models, instant first-token latency, no thermal throttling on sustained load. If you're coding in a cafe over LTE, cloud wins on responsiveness.

The honest framing: this is a laptop that does AI as a bonus, not an AI rig you happen to carry. If you were buying an M5 Max anyway, the 64 GB upgrade is ~$200 over the 48 GB 40-core-GPU base, and the extra headroom is meaningful. If you're buying specifically for AI, a desktop rig is better value.

Next step

Load this setup into the planner