the AI bench
VERIFIED JUNE 2026
All hardware

HARDWARE · MoE MINI-PC · 128 GB UNIFIED

Framework Desktop (Ryzen AI Max+ 395)

The first mini-PC with GPU-class bandwidth and 128 GB unified memory.

Strix Halo's 40-CU Radeon 8060S iGPU plus 128 GB LPDDR5X unified memory runs Qwen 3 30B-A3B MoE at ~72 tok/s — 4× the bandwidth of the Minisforum UM890 Pro, 4× the memory. A genuine local-AI mini-PC, not a CPU box that happens to boot.

The decision in five lines

The call
Buy — The first mini-PC with GPU-class bandwidth and 128 GB unified memory.
Best for
MoE mini-PC
Runs well
Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.5 35B-A3B (MoE, fits 24GB) · Gemma 4 31B (256K context)
Watch out
Backend choice is load-bearing. Vulkan (RADV) beats ROCm/HIP for MoE token generation on the same hardware; HIP hits ~40% of theoretical throughput per llama.cpp #13565. Best practice varies per model family — check recent benchmark threads before assuming either backend wins everywhere.
Evidence
Estimated · last verified June 2026

128
GB UNIFIED
~212
GB/S MEASURED
45–120
W CONFIGURABLE
~$1,999
FRAMEWORK BASE

What fits at this tier

128 GB LPDDR5X unified with up to 96 GB assignable as VRAM to the 8060S iGPU. Llama 3.1 8B Q4 at ~42 tok/s (Vulkan). Qwen 3 30B-A3B MoE Q4 at ~72 tok/s (the story pick — MoE-on-unified-memory unlock). Llama 4 Scout 109B MoE Q4 at ~20 tok/s. 70B dense Q4 technically fits but runs at ~5 tok/s — a demo, not a daily driver.

CODING
Qwen3-Coder-30B-A3B (MoE, fits 24GB) 3B-active MoE — benchmark champion for local coding at this tier.
CHAT / GENERAL
Qwen 3.5 35B-A3B (MoE, fits 24GB) 3B active MoE — 30B quality at 3B inference speed.
DOCS & RETRIEVAL
Gemma 4 31B (256K context) 31B dense with 256K context; Gemma commercial-permissive terms; Arena top 5.
IMAGE
HiDream-O1-Image (8B, MIT) May 8, 2026 release. Pixel-space (no VAE, no disjoint text encoder) — debuted top-10 on Artificial Analysis T2I Arena. MIT-licensed 8B; one model handles T2I + edit + subject-driven personalization at up to 2,048².
AGENTS
Qwen 3.5 35B-A3B (MoE, fits 24GB) MoE with native tool use; fits 24GB at Q4; Apache 2.0.
VOICE
VoxCPM2 (2B, Apache 2.0) 30 languages, 48 kHz, tokenizer-free diffusion AR; voice design from text. April 2026 release.

The call

Buy it if your work is MoE-heavy (Qwen 3.5 35B-A3B, Llama 4 Scout) and you want a small quiet box instead of a GPU tower. Framework's repair ethos + mini-ITX upgrade path is the premium over competing Strix Halo SKUs from GMKtec / Beelink / HP.

Skip it if you want dense 70B at interactive speed — that's not this box, despite what the 128 GB number suggests. Also skip if you need a warranty-backed workstation — go HP Z2 Mini G1a (~$3,734) for enterprise warranty coverage.

Watchouts

  • Backend choice is load-bearing. Vulkan (RADV) beats ROCm/HIP for MoE token generation on the same hardware; HIP hits ~40% of theoretical throughput per llama.cpp #13565. Best practice varies per model family — check recent benchmark threads before assuming either backend wins everywhere.
  • 70B dense is a trap. 128 GB fits it, but 5 tok/s makes it unusable for interactive work. Treat this as a MoE specialist, not a 70B dense machine.
  • Supply chain: 128 GB batches keep selling out. Plan on 1–6 week wait even when Framework's store shows "in stock."
  • The 50 TOPS XDNA 2 NPU is largely unused by llama.cpp / Ollama / LM Studio as of June 2026. Don't buy for NPU — the bandwidth + 128 GB unified is the actual value.

Local vs cloud at this tier

● LOCAL WINS

MoE 30B-A3B and 109B-A17B unbounded at interactive speeds in a 4.5 L box. The only mini-PC with this memory + bandwidth combo for under $3,000.

● CLOUD WINS

Cloud wins on dense 70B+ (Mac Studio M3 Ultra 96 GB is the local path there, not this), frontier reasoning, first-day model access.

Fills a real gap between 24 GB NVIDIA cards and Mac Studio Ultra: if your workload is MoE-shaped (which is where the frontier is trending), this is the most useful mini-PC ever shipped. If your workload is 70B dense, look elsewhere.

Next step

Load this setup into the planner