Mac Studio M4 Max 64 GB

The silent 6 W-idle box that runs 35B-A3B MoE comfortably and 70B dense Q4 with a memory tweak.

64 GB unified memory at 546 GB/s. Runs 30B-A3B MoE at 70–100 tok/s silently at 6 W idle, Qwen 3.5 27B dense at ~20 tok/s, and FLUX.2 klein pipelines cleanly. 70B dense Q4 fits with a `sudo sysctl iogpu.wired_limit_mb` tweak at 8–15 tok/s — workable, not silent under sustained load. Previous-gen M4 now, and Bloomberg (April 19, 2026) reported the M5 Mac Studio refresh slipped to October 2026 — supply chain. Buy-now case is stronger than it was a week ago.

The decision in five lines

The call: Buy — The silent 6 W-idle box that runs 35B-A3B MoE comfortably and 70B dense Q4 with a memory tweak.
Best for: Desktop Mac
Runs well: Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.6-27B · Z-Image-Turbo (Apache 2.0)
Watch out: Supply-constrained. Apple pulled 128 GB / 256 GB configs from intake April 11, 2026 on memory-chip shortage. 64 GB still orderable but delivery is 5–6 weeks, not next-day.
Evidence: Estimated · last verified July 2026

64: GB UNIFIED
546: GB/S BANDWIDTH
145: W PEAK (6W IDLE)
$3,799: NEW FROM APPLE

What fits at this tier

Runs 35B-A3B MoE at 70–100 tok/s and Qwen 3.5 27B dense at ~20 tok/s comfortably; 70B dense Q4 fits with a wired-memory tweak at 8–15 tok/s — workable for chat, tight for long context. Prefill on long prompts is slower than NVIDIA. 122B-A10B needs the 128 GB tier, not 64 GB.

CODING

Qwen3-Coder-30B-A3B (MoE, fits 24GB) Community daily driver for local coding; 3B-active MoE delivers 30B quality at 3B-dense speed.

CHAT / GENERAL

Qwen 3.6-27B April 22 2026 dense refresh; supersedes Qwen 3.5 27B and claims to beat the prior 397B MoE flagship while staying single-GPU at Q4 (~17 GB).

DOCS & RETRIEVAL

Qwen 3.6-27B April 22 2026 dense refresh — 262K native context extensible to 1M, multimodal, single-GPU at Q4. Now the dense long-context top pick.

IMAGE

Z-Image-Turbo (Apache 2.0) Community daily driver for realism; 6B, 8-step inference, Apache 2.0 — commercial OK.

AGENTS

Qwen 3.6-35B-A3B Latest Qwen MoE; strong function calling; realistic on 24GB+ VRAM or Mac 48GB+ — the local agentic top pick.

VOICE

Qwen3-Omni-30B-A3B-Instruct Apache 2.0 MoE; audio+video+image+text in, speech+text out; 17GB at Q4. Frontier unified voice.

The call

Buy it if you want the best silent-desk local AI machine and need 64 GB of usable memory without the dual-3090 tower, driver pain, or 1,000 W PSU. Doubles as a regular Mac.
Skip it if you can wait ~2 months — M5 Mac Studio is expected at WWDC 2026 with 20–30% bandwidth uplift. Also skip if image-gen is your primary workload; FLUX.2 / HiDream pipelines still run materially faster on NVIDIA CUDA.

Watchouts

Supply-constrained. Apple pulled 128 GB / 256 GB configs from intake April 11, 2026 on memory-chip shortage. 64 GB still orderable but delivery is 5–6 weeks, not next-day.
RAM is soldered and non-upgradeable. 64 GB is permanent. If you later want 120B MoE, your only path is selling and buying bigger.
Prefill latency on long prompts is 2–5× slower than comparable NVIDIA per token. First-token on 32K prompt can hit 60–90 s.
M5 Mac Studio refresh delayed to October 2026 (Bloomberg/Gurman, April 19). Same chassis expected, likely 20–30% bandwidth bump. The 64 GB tier is custom-build only on Apple's configurator now — default landing shows 36 GB and M3 Ultra 96 GB; expect 5–6 week delivery on 64 GB orders.

Local vs cloud at this tier

● LOCAL WINS

Silent 24/7 inference at 6 W idle. 35B-A3B MoE at 70+ tok/s unlimited; 70B Q4 workable with a memory tweak. Full privacy, no per-token cost. Doubles as a regular desktop.

● CLOUD WINS

Frontier reasoning (GPT-5.4, Claude Opus 4.8) — nothing local matches them. Image-gen throughput (CUDA advantage is still meaningful). Long-prompt docs workflows where prefill latency bites.

At $3,799 (up $600 in Apple's June 25 2026 hike) + ~$5/mo electricity, break-even vs a $100/mo ChatGPT Pro plan is ~40 months. Vs a $200/mo Max plan, break-even drops to ~20 months. The honest case for buying: privacy + unlimited use + silence, not cloud-substitution cost math.

Next step

Load this setup into the planner→