Framework Desktop (Ryzen AI Max+ 395)

The first mini-PC with GPU-class bandwidth and 128 GB unified memory.

Strix Halo's 40-CU Radeon 8060S iGPU plus 128 GB LPDDR5X unified memory runs Qwen 3 30B-A3B MoE at ~72 tok/s — 4× the bandwidth of the Minisforum UM890 Pro, 4× the memory. A genuine local-AI mini-PC, not a CPU box that happens to boot.

The decision in five lines

The call: Buy — The first mini-PC with GPU-class bandwidth and 128 GB unified memory.
Best for: MoE mini-PC
Runs well: Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.5 35B-A3B (MoE, fits 24GB) · Gemma 4 31B (256K context)
Watch out: Backend choice is load-bearing. Vulkan (RADV) beats ROCm/HIP for MoE token generation on the same hardware; HIP hits ~40% of theoretical throughput per llama.cpp #13565. Best practice varies per model family — check recent benchmark threads before assuming either backend wins everywhere.
Evidence: Estimated · last verified July 2026

128: GB UNIFIED
~212: GB/S MEASURED
45–120: W CONFIGURABLE
$2,459: 128 GB (RAM-HIKE PRICE)

What fits at this tier

128 GB LPDDR5X unified with up to 96 GB assignable as VRAM to the 8060S iGPU. Llama 3.1 8B Q4 at ~42 tok/s (Vulkan). Qwen 3 30B-A3B MoE Q4 at ~72 tok/s (the story pick — MoE-on-unified-memory unlock). Llama 4 Scout 109B MoE Q4 at ~20 tok/s. 70B dense Q4 technically fits but runs at ~5 tok/s — a demo, not a daily driver.

CODING

Qwen3-Coder-30B-A3B (MoE, fits 24GB) 3B-active MoE — benchmark champion for local coding at this tier.

CHAT / GENERAL

Qwen 3.5 35B-A3B (MoE, fits 24GB) 3B active MoE — 30B quality at 3B inference speed.

DOCS & RETRIEVAL

Gemma 4 31B (256K context) 31B dense with 256K context; Gemma commercial-permissive terms; Arena top 5.

IMAGE

HiDream-O1-Image (8B, MIT) May 8, 2026 release. Pixel-space (no VAE, no disjoint text encoder) — debuted top-10 on Artificial Analysis T2I Arena. MIT-licensed 8B; one model handles T2I + edit + subject-driven personalization at up to 2,048².

AGENTS

Qwen 3.5 35B-A3B (MoE, fits 24GB) MoE with native tool use; fits 24GB at Q4; Apache 2.0.

VOICE

VoxCPM2 (2B, Apache 2.0) 30 languages, 48 kHz, tokenizer-free diffusion AR; voice design from text. April 2026 release.

The call

Buy it if your work is MoE-heavy (Qwen 3.5 35B-A3B, Llama 4 Scout) and you want a small quiet box instead of a GPU tower. Framework's repair ethos + mini-ITX upgrade path is the premium over competing Strix Halo SKUs from GMKtec / Beelink / HP.
Skip it if you want dense 70B at interactive speed — that's not this box, despite what the 128 GB number suggests. Also skip if you need a warranty-backed workstation — go HP Z2 Mini G1a (~$3,734) for enterprise warranty coverage.

Watchouts

Backend choice is load-bearing. Vulkan (RADV) beats ROCm/HIP for MoE token generation on the same hardware; HIP hits ~40% of theoretical throughput per llama.cpp #13565. Best practice varies per model family — check recent benchmark threads before assuming either backend wins everywhere.
70B dense is a trap. 128 GB fits it, but 5 tok/s makes it unusable for interactive work. Treat this as a MoE specialist, not a 70B dense machine.
Pricing moved with the RAM crisis: Framework raised the 128 GB config from $1,999 to a fixed $2,459 in January 2026 (+$460 on memory alone; 32 GB is $1,139, 64 GB $1,639). Framework says it will cut prices when the memory market calms — no sign of that yet. Cheaper Strix Halo 128 GB boxes exist (GMKtec EVO-X2 has been listed ~$1,500) if you don't need Framework's repairability.
Supply chain: 128 GB batches keep selling out. Plan on 1–6 week wait even when Framework's store shows "in stock."
The 50 TOPS XDNA 2 NPU is largely unused by llama.cpp / Ollama / LM Studio as of July 2026. Don't buy for NPU — the bandwidth + 128 GB unified is the actual value.

Local vs cloud at this tier

● LOCAL WINS

MoE 30B-A3B and 109B-A17B unbounded at interactive speeds in a 4.5 L box. The only mini-PC with this memory + bandwidth combo for under $3,000.

● CLOUD WINS

Cloud wins on dense 70B+ (Mac Studio M3 Ultra 96 GB is the local path there, not this), frontier reasoning, first-day model access.

Fills a real gap between 24 GB NVIDIA cards and Mac Studio Ultra: if your workload is MoE-shaped (which is where the frontier is trending), this is the most useful mini-PC ever shipped. If your workload is 70B dense, look elsewhere.

Next step

Load this setup into the planner→