HARDWARE · MINI PC · NO DEDICATED GPU
Minisforum UM890 Pro
The silent always-on 8B-class node that most people shouldn't buy.
AMD Ryzen 9 8945HS + Radeon 780M iGPU + DDR5-5600 in a 0.5 L chassis. At $580 all-in with 32 GB RAM it runs Llama 3.1 8B at 15–22 tok/s via llama.cpp + Vulkan. The honest frame: cloud wins outright for answer quality here — this is a privacy / always-on / homelab pick, not a performance one.
The decision in five lines
- The call
- Skip for local — The silent always-on 8B-class node that most people shouldn't buy.
- Best for
- No-GPU · cautionary
- Runs well
- Qwen 3.5 4B · Qwen 3.5 4B + tight RAG · SANA-0.6B (non-commercial)
- Watch out
- DDR5-5600 ceiling: ~50 GB/s real-world bandwidth is the hard limit. Inference speed will not improve with a better CPU — it's memory-bound. 13B Q4 reads ~8 GB per token → ~6 tok/s theoretical max.
- Evidence
- Estimated
- 32
- GB DDR5-5600
- ~50
- GB/S BANDWIDTH
- 65
- W cTDP
- ~$580
- ALL-IN (BUILT)
What fits at this tier
Runs 7B–8B models at Q4 cleanly via llama.cpp + Vulkan on the Radeon 780M iGPU. 15–22 tok/s on 8B Q4. 13B+ becomes memory-bandwidth-bound — theoretical ceiling is ~6 tok/s on 13B Q4. Radeon 780M has no official ROCm support; Vulkan is the practical path.
The call
Buy it if you want a silent 65 W always-on node for homelab / privacy / tinkering, and you value the aesthetic over raw throughput. Doubles as a general-purpose mini PC.
Skip it if you want real local-AI mini-PC capability — a Framework Desktop (Ryzen AI Max+ 395, 128 GB unified, ~212 GB/s) costs ~$2,000 and delivers 4× the bandwidth and 4× the memory. Also skip if pure throughput matters: a used RTX 3060 12 GB triples tok/s for half the price.
Watchouts
- DDR5-5600 ceiling: ~50 GB/s real-world bandwidth is the hard limit. Inference speed will not improve with a better CPU — it's memory-bound. 13B Q4 reads ~8 GB per token → ~6 tok/s theoretical max.
- Thermal throttling under sustained inference. Chassis gets warm; community reports ~15–20% throughput drop after 10 min of sustained token generation.
- Radeon 780M has no official ROCm support. Vulkan backend works but doesn't get AMD's performance optimizations. Don't expect rocBLAS speedups.
- 16 TOPS NPU (XDNA) is marketing. No production llama.cpp / Ollama path uses the NPU as of June 2026. Idle during inference.
Local vs cloud at this tier
● LOCAL WINS
Silent, always-on, private. Useful for homelab automation, offline summarization, privacy-sensitive local chat at 7B–8B. Doubles as a general mini PC.
● CLOUD WINS
Answer quality. A $580 box at 15 tok/s on 8B Q4 runs roughly 2024-era GPT-3.5-quality output. ChatGPT Plus at $20/mo delivers frontier quality at 50+ tok/s with zero hardware investment.
The mini PC makes sense only when the value is *local itself* — privacy, offline, homelab, always-on automation — not when the value is raw answer quality. Break-even vs $100/mo ChatGPT Pro is ~6 months on cost alone, but cost isn't what justifies this pick.
Next step
Load this setup into the planner→