Minisforum UM890 Pro

The silent always-on 8B-class node that most people shouldn't buy.

AMD Ryzen 9 8945HS + Radeon 780M iGPU + DDR5-5600 in a 0.5 L chassis. At $580 all-in with 32 GB RAM it runs Llama 3.1 8B at 15–22 tok/s via llama.cpp + Vulkan. The honest frame: cloud wins outright for answer quality here — this is a privacy / always-on / homelab pick, not a performance one.

The decision in five lines

The call: Skip for local — The silent always-on 8B-class node that most people shouldn't buy.
Best for: No-GPU · cautionary
Runs well: Qwen 3.5 4B · Qwen 3.5 4B + tight RAG · SANA-0.6B (non-commercial)
Watch out: DDR5-5600 ceiling: ~50 GB/s real-world bandwidth is the hard limit. Inference speed will not improve with a better CPU — it's memory-bound. 13B Q4 reads ~8 GB per token → ~6 tok/s theoretical max.
Evidence: Estimated · last verified July 2026

32: GB DDR5-5600
~50: GB/S BANDWIDTH
65: W cTDP
~$580: ALL-IN (BUILT)

What fits at this tier

Runs 7B–8B models at Q4 cleanly via llama.cpp + Vulkan on the Radeon 780M iGPU. 15–22 tok/s on 8B Q4. 13B+ becomes memory-bandwidth-bound — theoretical ceiling is ~6 tok/s on 13B Q4. Radeon 780M has no official ROCm support; Vulkan is the practical path.

CODING

Qwen 3.5 4B 4B dense with 262K context; surprisingly coherent for its size.

CHAT / GENERAL

Qwen 3.5 4B 4B dense with 262K context and native multimodal.

DOCS & RETRIEVAL

Qwen 3.5 4B + tight RAG 4B plus tight chunking; keep context windows small.

IMAGE

SANA-0.6B (non-commercial) 0.6B params; <1s per 1024² on a 16GB laptop GPU; weights are NVIDIA NSCL v2 (non-commercial).

AGENTS

Ministral 3 3B Smallest Ministral with reasoning + tool use.

VOICE

Kokoro-82M (Apache 2.0) Community daily driver for English TTS; CPU-real-time at 82M params; v1.0 with 8 languages and 54 voices. No voice cloning.

The call

Buy it if you want a silent 65 W always-on node for homelab / privacy / tinkering, and you value the aesthetic over raw throughput. Doubles as a general-purpose mini PC.
Skip it if you want real local-AI mini-PC capability — a Framework Desktop (Ryzen AI Max+ 395, 128 GB unified, ~212 GB/s) costs ~$2,000 and delivers 4× the bandwidth and 4× the memory. Also skip if pure throughput matters: a used RTX 3060 12 GB triples tok/s for half the price.

Watchouts

DDR5-5600 ceiling: ~50 GB/s real-world bandwidth is the hard limit. Inference speed will not improve with a better CPU — it's memory-bound. 13B Q4 reads ~8 GB per token → ~6 tok/s theoretical max.
Thermal throttling under sustained inference. Chassis gets warm; community reports ~15–20% throughput drop after 10 min of sustained token generation.
Radeon 780M has no official ROCm support. Vulkan backend works but doesn't get AMD's performance optimizations. Don't expect rocBLAS speedups.
16 TOPS NPU (XDNA) is marketing. No production llama.cpp / Ollama path uses the NPU as of July 2026. Idle during inference.

Local vs cloud at this tier

● LOCAL WINS

Silent, always-on, private. Useful for homelab automation, offline summarization, privacy-sensitive local chat at 7B–8B. Doubles as a general mini PC.

● CLOUD WINS

Answer quality. A $580 box at 15 tok/s on 8B Q4 runs roughly 2024-era GPT-3.5-quality output. ChatGPT Plus at $20/mo delivers frontier quality at 50+ tok/s with zero hardware investment.

The mini PC makes sense only when the value is *local itself* — privacy, offline, homelab, always-on automation — not when the value is raw answer quality. Break-even vs $100/mo ChatGPT Pro is ~6 months on cost alone, but cost isn't what justifies this pick.

Next step

Load this setup into the planner→