HARDWARE · MoE MINI-PC · 128 GB UNIFIED
Framework Desktop (Ryzen AI Max+ 395)
The first mini-PC with GPU-class bandwidth and 128 GB unified memory.
Strix Halo's 40-CU Radeon 8060S iGPU plus 128 GB LPDDR5X unified memory runs Qwen 3 30B-A3B MoE at ~72 tok/s — 4× the bandwidth of the Minisforum UM890 Pro, 4× the memory. A genuine local-AI mini-PC, not a CPU box that happens to boot.
The decision in five lines
- The call
- Buy — The first mini-PC with GPU-class bandwidth and 128 GB unified memory.
- Best for
- MoE mini-PC
- Runs well
- Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.5 35B-A3B (MoE, fits 24GB) · Gemma 4 31B (256K context)
- Watch out
- Backend choice is load-bearing. Vulkan (RADV) beats ROCm/HIP for MoE token generation on the same hardware; HIP hits ~40% of theoretical throughput per llama.cpp #13565. Best practice varies per model family — check recent benchmark threads before assuming either backend wins everywhere.
- Evidence
- Estimated
- 128
- GB UNIFIED
- ~212
- GB/S MEASURED
- 45–120
- W CONFIGURABLE
- ~$1,999
- FRAMEWORK BASE
What fits at this tier
128 GB LPDDR5X unified with up to 96 GB assignable as VRAM to the 8060S iGPU. Llama 3.1 8B Q4 at ~42 tok/s (Vulkan). Qwen 3 30B-A3B MoE Q4 at ~72 tok/s (the story pick — MoE-on-unified-memory unlock). Llama 4 Scout 109B MoE Q4 at ~20 tok/s. 70B dense Q4 technically fits but runs at ~5 tok/s — a demo, not a daily driver.
The call
Buy it if your work is MoE-heavy (Qwen 3.5 35B-A3B, Llama 4 Scout) and you want a small quiet box instead of a GPU tower. Framework's repair ethos + mini-ITX upgrade path is the premium over competing Strix Halo SKUs from GMKtec / Beelink / HP.
Skip it if you want dense 70B at interactive speed — that's not this box, despite what the 128 GB number suggests. Also skip if you need a warranty-backed workstation — go HP Z2 Mini G1a (~$3,734) for enterprise warranty coverage.
Watchouts
- Backend choice is load-bearing. Vulkan (RADV) beats ROCm/HIP for MoE token generation on the same hardware; HIP hits ~40% of theoretical throughput per llama.cpp #13565. Best practice varies per model family — check recent benchmark threads before assuming either backend wins everywhere.
- 70B dense is a trap. 128 GB fits it, but 5 tok/s makes it unusable for interactive work. Treat this as a MoE specialist, not a 70B dense machine.
- Supply chain: 128 GB batches keep selling out. Plan on 1–6 week wait even when Framework's store shows "in stock."
- The 50 TOPS XDNA 2 NPU is largely unused by llama.cpp / Ollama / LM Studio as of June 2026. Don't buy for NPU — the bandwidth + 128 GB unified is the actual value.
Local vs cloud at this tier
● LOCAL WINS
MoE 30B-A3B and 109B-A17B unbounded at interactive speeds in a 4.5 L box. The only mini-PC with this memory + bandwidth combo for under $3,000.
● CLOUD WINS
Cloud wins on dense 70B+ (Mac Studio M3 Ultra 96 GB is the local path there, not this), frontier reasoning, first-day model access.
Fills a real gap between 24 GB NVIDIA cards and Mac Studio Ultra: if your workload is MoE-shaped (which is where the frontier is trending), this is the most useful mini-PC ever shipped. If your workload is 70B dense, look elsewhere.
Next step
Load this setup into the planner→