HARDWARE · 23 CURATED PICKS
Hardware we'd actually buy to run local AI.
Curated, opinionated, dated. Every pick reviewed quarterly against real street prices and current model weight classes. No affiliate links. No listicle bloat. Just the hardware that solves a specific problem — and an honest take on who should pass.
Frontier tier — 48 GB+ serious rigs
Dual RTX 5090
Two RTX 5090s with 2× 1,792 GB/s bandwidth and 64 GB total VRAM. This is the first consumer configuration that fits 122B-A10B MoE with room AND generates tokens fast enough to use interactively. The tradeoff is 1,500 W sustained draw, dual 12VHPWR connectors, and a case that fits 9-slot cards side-by-side.
NVIDIA RTX A6000 (48 GB, used)
The only consumer-reachable single-card path to 48 GB VRAM under $5,000. Ampere-generation workstation silicon with ECC memory, 768 GB/s bandwidth, and a dual-slot blower that tolerates sustained load. Used-market prices make it an Ada-tax-avoidance play.
NVIDIA DGX Spark
128 GB of unified memory via NVIDIA's GB10 Grace Blackwell Superchip — 4× what any consumer GPU gives you. The catch: 273 GB/s bandwidth is ~27% of an RTX 4090, so you trade raw speed for fit. A capacity-first machine, not a speed-first machine.
Mac Studio M3 Ultra 96 GB
819 GB/s unified memory bandwidth — the highest in any shipping Mac — plus 96 GB capacity puts Llama 3.3 70B Q4 at 12–18 tok/s on a box that runs at ~70 W idle and fits on a bookshelf. Dual M3 Max dies under one heatsink, no GPU tower, no fan noise.
Dual RTX 3090 (used)
Two used 3090s give you 48 GB of VRAM for roughly $1,600 all-in — enough for 70B dense at Q4 with room for context. llama.cpp and Ollama split across PCIe automatically; no NVLink needed. The compromise is noise, heat, and finding honest used cards.
Top tier — 32-64 GB new-gen
NVIDIA RTX 5090
A 32 GB Blackwell card that runs every modern coding, chat, and agent model at Q4 with headroom, at speeds a used dual-3090 rig can match only with a power bill and a compromise. AIB allocation has thawed enough that the entry floor came back down to ~$2,910 in late May.
Mac Studio M4 Max 64 GB
64 GB unified memory at 546 GB/s. Runs 30B-A3B MoE at 70–100 tok/s silently at 6 W idle, Qwen 3.5 27B dense at ~20 tok/s, and FLUX.2 klein pipelines cleanly. 70B dense Q4 fits with a `sudo sysctl iogpu.wired_limit_mb` tweak at 8–15 tok/s — workable, not silent under sustained load. Previous-gen M4 now, and Bloomberg (April 19, 2026) reported the M5 Mac Studio refresh slipped to October 2026 — supply chain. Buy-now case is stronger than it was a week ago.
M5 Max MacBook Pro 64 GB
64 GB of unified memory at 614 GB/s on the 40-core GPU M5 Max. Runs every modern model up to 35B-A3B MoE at reasonable speed, in a silent chassis that sustains load on battery. The compromise: prefill on long prompts is noticeably slower than NVIDIA, and you pay Apple's storage tax to go beyond 48 GB.
M5 Pro MacBook Pro 48 GB
48 GB unified at 307 GB/s — 44% more bandwidth than M4 Pro, enough to run Qwen 3.5 35B-A3B MoE at 70–90 tok/s on battery, in a laptop. The honest step-up from the Mac mini M4 Pro 24 GB without going to the $4,499 M5 Max 64 GB.
Framework Desktop (Ryzen AI Max+ 395)
Strix Halo's 40-CU Radeon 8060S iGPU plus 128 GB LPDDR5X unified memory runs Qwen 3 30B-A3B MoE at ~72 tok/s — 4× the bandwidth of the Minisforum UM890 Pro, 4× the memory. A genuine local-AI mini-PC, not a CPU box that happens to boot.
Smart money — 24 GB done right
NVIDIA RTX 4090
Same 24 GB VRAM ceiling as the new generation's sweet spot, 1 TB/s bandwidth, mature CUDA stack, no 12VHPWR drama if you buy a unit with the updated 12V-2x6 connector. Buy used from a trusted seller — new retail at scalper prices is not the right move.
NVIDIA RTX 3090 (used, single)
24 GB of GDDR6X at 936 GB/s for ~$1,050 on the used market in June 2026 — every dollar you spend on a 3090 still buys more usable VRAM than any other card in the lineup, even after the used-market floor lifted ~$200 since April as buyers priced out of 5090 scarcity moved a tier down. The tradeoff is age, heat, and a GDDR6X memory package that runs hot after half a decade.
AMD Radeon RX 7900 XTX
24 GB at roughly 85–90% of a 4090's throughput under ROCm. The hardware is fine; the software ecosystem is the tax. Plan 5–10 hours on first-time ROCm setup, plus the ongoing friction of Ollama being patchy on AMD. New-market pricing has split sharply from used since the DRAM crunch — used 3090s and used 7900 XTXs are now the same $760 band.
Mac Mini M4 Pro 24 GB
At 273 GB/s — 2.3× the base Mac mini M4's bandwidth — the M4 Pro in its base 12-core CPU / 16-core GPU bin is the first Apple silicon SKU where 14B dense Q4 feels responsive, not ponderous. Silent, 4 W idle, $1,399 from Apple.
MacBook Air M5 24 GB
The M5 Air at 24 GB is the first Apple laptop where 8B dense Q4 inference feels responsive without a fan ramping up — because there is no fan. 153 GB/s bandwidth is the honest limiting factor; this is not a 14B-comfortable machine.
Affordable entry — 12-16 GB and under
NVIDIA RTX 5070 Ti
16 GB GDDR7 at 896 GB/s — 93% of the 5080's bandwidth for ~15% less money at street price. Hardware Corner measured 185 tok/s on Qwen 2.5 14B Q4 short-context, which is the honest sweet spot for this card.
NVIDIA RTX 5080
Blackwell architecture + GDDR7 at 960 GB/s buys you ~30–40% more tok/s than the 5060 Ti 16 GB, but the VRAM ceiling is identical. If your work lives in the 8B–14B dense band, this is the honest Blackwell pick; if you need 30B-A3B MoE with headroom, you need more memory.
AMD Radeon RX 9070 XT
16 GB GDDR6 at 640 GB/s on AMD's first RDNA4 architecture. AI throughput per compute unit doubled vs RDNA3 — paired with proper ROCm 7+ this is the AMD card to buy if you're entering local AI on Team Red today. Pairs cleanly with the new tooling story; the 7900 XTX retains a 24 GB lead but commands scarcity premiums on the new market.
RTX 5060 Ti 16 GB
16 GB GDDR7 at $559 Amazon. Runs 14B dense at Q4 at ~33 tok/s with room for 16K context; 30B-A3B MoE fits cleanly at Q3 (~13 GB), or at Q4 (~17 GB) with partial CPU offload. The honest entry point for local AI if you want new hardware with a warranty.
Mac Mini M4 16 GB
Apple discontinued the $599 Mac mini base config on May 1, 2026 and raised the floor to $799 with 512 GB. The 16 GB / 256 GB SKU only survives on Amazon residuals and eBay. If you can find one near $499, the 8B-class story still holds; otherwise the math has shifted toward the 24 GB M4 Pro.
NVIDIA RTX 3060 12 GB
A 5-year-old card that still runs Llama 3.1 8B Q4 at 52 tok/s. Amazon street is ~$354, eBay used floors around $230, and NVIDIA is restarting 8 nm production with Samsung in June 2026 — supply should ease through summer. 12 GB is the minimum VRAM that matters for 8B-class models with any context, and CUDA just works everywhere.
Cautionary picks — read the watchouts first
Intel Arc B580 12 GB
12 GB GDDR6 at $249 MSRP is the cheapest new discrete GPU with enough VRAM for 8B-class local AI. The catch is the software: Intel's IPEX-LLM — the main path for Ollama on Arc — was archived on January 28, 2026. Still works, still runs 8B models at 28–62 tok/s, but you're betting on a project Intel is no longer actively maintaining. Worse: Intel canceled the Arc B770 mid-2026 and re-routed the BMG-G31 die to a Pro workstation card, so the B580 is the terminal Battlemage consumer SKU.
Minisforum UM890 Pro
AMD Ryzen 9 8945HS + Radeon 780M iGPU + DDR5-5600 in a 0.5 L chassis. At $580 all-in with 32 GB RAM it runs Llama 3.1 8B at 15–22 tok/s via llama.cpp + Vulkan. The honest frame: cloud wins outright for answer quality here — this is a privacy / always-on / homelab pick, not a performance one.
Specific rig checks
Comparing a specific rig? Start with Mac Studio M3 Ultra 96 GB, M5 Max MacBook Pro 64 GB, RTX 5060 Ti 16 GB, and the local LLM benchmark table.
How we pick
Every pick has to answer one question — "is this the honest best answer at this price point for someone running local AI?" We verify prices against Newegg, Amazon, BestValueGPU, and eBay sold-listings each quarter. We update or retire picks the day a new model or a price swing makes them wrong. No sponsored entries. No affiliate links under Path 2.
Picks are updated as the market shifts — when a new GPU lands, a price moves materially, or a model release changes the fit calculus.