the AI bench
VERIFIED JUNE 2026
All hardware

HARDWARE · MINI PC · NO DEDICATED GPU

Minisforum UM890 Pro

The silent always-on 8B-class node that most people shouldn't buy.

AMD Ryzen 9 8945HS + Radeon 780M iGPU + DDR5-5600 in a 0.5 L chassis. At $580 all-in with 32 GB RAM it runs Llama 3.1 8B at 15–22 tok/s via llama.cpp + Vulkan. The honest frame: cloud wins outright for answer quality here — this is a privacy / always-on / homelab pick, not a performance one.

The decision in five lines

The call
Skip for local — The silent always-on 8B-class node that most people shouldn't buy.
Best for
No-GPU · cautionary
Runs well
Qwen 3.5 4B · Qwen 3.5 4B + tight RAG · SANA-0.6B (non-commercial)
Watch out
DDR5-5600 ceiling: ~50 GB/s real-world bandwidth is the hard limit. Inference speed will not improve with a better CPU — it's memory-bound. 13B Q4 reads ~8 GB per token → ~6 tok/s theoretical max.
Evidence
Estimated · last verified June 2026

32
GB DDR5-5600
~50
GB/S BANDWIDTH
65
W cTDP
~$580
ALL-IN (BUILT)

What fits at this tier

Runs 7B–8B models at Q4 cleanly via llama.cpp + Vulkan on the Radeon 780M iGPU. 15–22 tok/s on 8B Q4. 13B+ becomes memory-bandwidth-bound — theoretical ceiling is ~6 tok/s on 13B Q4. Radeon 780M has no official ROCm support; Vulkan is the practical path.

CODING
Qwen 3.5 4B 4B dense with 262K context; surprisingly coherent for its size.
CHAT / GENERAL
Qwen 3.5 4B 4B dense with 262K context and native multimodal.
DOCS & RETRIEVAL
Qwen 3.5 4B + tight RAG 4B plus tight chunking; keep context windows small.
IMAGE
SANA-0.6B (non-commercial) 0.6B params; <1s per 1024² on a 16GB laptop GPU; weights are NVIDIA NSCL v2 (non-commercial).
AGENTS
Ministral 3 3B Smallest Ministral with reasoning + tool use.
VOICE
Kokoro-82M (Apache 2.0) Community daily driver for English TTS; CPU-real-time at 82M params; v1.0 with 8 languages and 54 voices. No voice cloning.

The call

Buy it if you want a silent 65 W always-on node for homelab / privacy / tinkering, and you value the aesthetic over raw throughput. Doubles as a general-purpose mini PC.

Skip it if you want real local-AI mini-PC capability — a Framework Desktop (Ryzen AI Max+ 395, 128 GB unified, ~212 GB/s) costs ~$2,000 and delivers 4× the bandwidth and 4× the memory. Also skip if pure throughput matters: a used RTX 3060 12 GB triples tok/s for half the price.

Watchouts

  • DDR5-5600 ceiling: ~50 GB/s real-world bandwidth is the hard limit. Inference speed will not improve with a better CPU — it's memory-bound. 13B Q4 reads ~8 GB per token → ~6 tok/s theoretical max.
  • Thermal throttling under sustained inference. Chassis gets warm; community reports ~15–20% throughput drop after 10 min of sustained token generation.
  • Radeon 780M has no official ROCm support. Vulkan backend works but doesn't get AMD's performance optimizations. Don't expect rocBLAS speedups.
  • 16 TOPS NPU (XDNA) is marketing. No production llama.cpp / Ollama path uses the NPU as of June 2026. Idle during inference.

Local vs cloud at this tier

● LOCAL WINS

Silent, always-on, private. Useful for homelab automation, offline summarization, privacy-sensitive local chat at 7B–8B. Doubles as a general mini PC.

● CLOUD WINS

Answer quality. A $580 box at 15 tok/s on 8B Q4 runs roughly 2024-era GPT-3.5-quality output. ChatGPT Plus at $20/mo delivers frontier quality at 50+ tok/s with zero hardware investment.

The mini PC makes sense only when the value is *local itself* — privacy, offline, homelab, always-on automation — not when the value is raw answer quality. Break-even vs $100/mo ChatGPT Pro is ~6 months on cost alone, but cost isn't what justifies this pick.

Next step

Load this setup into the planner