the AI bench
VERIFIED JUNE 2026

REVERSE LOOKUP · MODEL → HARDWARE

Which hardware runs this model?

Pick a model, see every rig that fits — with tier, VRAM, and price surfaced upfront.

Match logic is memory math against each hardware’s VRAM (Mac unified accounts for the ~33% macOS reserves by default; a Requires tweak badge means the model fits only with a sysctl iogpu.wired_limit_mb bump). Results are sorted cheapest-first so the best-value fit is always at the top.


Pick a model

About this model

BAAI's multi-functionality + multilingual (170+ languages) + multi-granularity embedding. The default "just use it" RAG embedding since early 2024.

Maker
BAAI
Params
568M (XLM-RoBERTa-large base)
License
MIT
Context
8192 tokens
VRAM at Q4
~1–2 GB

See full detail page →

23 hardware options fit

PERFECT · 5.9×12 GB

Intel Arc B580 12 GB

12 GB GDDR6 at $249 MSRP is the cheapest new discrete GPU with enough VRAM for 8B-class local AI. The catch is the software: Intel's IPEX-LLM — the main path for Ollama on Arc — was archived on January 28, 2026. Still works, still runs 8B models at 28–62 tok/s, but you're betting on a project Intel is no longer actively maintaining. Worse: Intel canceled the Arc B770 mid-2026 and re-routed the BMG-G31 die to a Pro workstation card, so the B580 is the terminal Battlemage consumer SKU.

$249–$299MeasuredRead →
PERFECT · 5.9×12 GB

NVIDIA RTX 3060 12 GB

A 5-year-old card that still runs Llama 3.1 8B Q4 at 52 tok/s. Amazon street is ~$354, eBay used floors around $230, and NVIDIA is restarting 8 nm production with Samsung in June 2026 — supply should ease through summer. 12 GB is the minimum VRAM that matters for 8B-class models with any context, and CUDA just works everywhere.

$280–$400EstimatedRead →
PERFECT · 11.8×32 GB DDR5 (shared)

Minisforum UM890 Pro

AMD Ryzen 9 8945HS + Radeon 780M iGPU + DDR5-5600 in a 0.5 L chassis. At $580 all-in with 32 GB RAM it runs Llama 3.1 8B at 15–22 tok/s via llama.cpp + Vulkan. The honest frame: cloud wins outright for answer quality here — this is a privacy / always-on / homelab pick, not a performance one.

$463–$580 all-inEstimatedRead →
PERFECT · 7.9×16 GB

RTX 5060 Ti 16 GB

16 GB GDDR7 at $559 Amazon. Runs 14B dense at Q4 at ~33 tok/s with room for 16K context; 30B-A3B MoE fits cleanly at Q3 (~13 GB), or at Q4 (~17 GB) with partial CPU offload. The honest entry point for local AI if you want new hardware with a warranty.

$560–$610MeasuredRead →
PERFECT · 7.9×16 GB

AMD Radeon RX 9070 XT

16 GB GDDR6 at 640 GB/s on AMD's first RDNA4 architecture. AI throughput per compute unit doubled vs RDNA3 — paired with proper ROCm 7+ this is the AMD card to buy if you're entering local AI on Team Red today. Pairs cleanly with the new tooling story; the 7900 XTX retains a 24 GB lead but commands scarcity premiums on the new market.

$649–$779EstimatedRead →
PERFECT · 11.8×24 GB

AMD Radeon RX 7900 XTX

24 GB at roughly 85–90% of a 4090's throughput under ROCm. The hardware is fine; the software ecosystem is the tax. Plan 5–10 hours on first-time ROCm setup, plus the ongoing friction of Ollama being patchy on AMD. New-market pricing has split sharply from used since the DRAM crunch — used 3090s and used 7900 XTXs are now the same $760 band.

$760 used / ~$1,500 newMeasuredRead →
PERFECT · 5.3×16 GB unified

Mac Mini M4 16 GB

Apple discontinued the $599 Mac mini base config on May 1, 2026 and raised the floor to $799 with 512 GB. The 16 GB / 256 GB SKU only survives on Amazon residuals and eBay. If you can find one near $499, the 8B-class story still holds; otherwise the math has shifted toward the 24 GB M4 Pro.

$799 (new floor) / $499–$599 (eBay/residuals)MeasuredRead →
PERFECT · 11.8×24 GB

NVIDIA RTX 3090 (used, single)

24 GB of GDDR6X at 936 GB/s for ~$1,050 on the used market in June 2026 — every dollar you spend on a 3090 still buys more usable VRAM than any other card in the lineup, even after the used-market floor lifted ~$200 since April as buyers priced out of 5090 scarcity moved a tier down. The tradeoff is age, heat, and a GDDR6X memory package that runs hot after half a decade.

$950–$1,200EstimatedRead →
PERFECT · 7.9×16 GB

NVIDIA RTX 5070 Ti

16 GB GDDR7 at 896 GB/s — 93% of the 5080's bandwidth for ~15% less money at street price. Hardware Corner measured 185 tok/s on Qwen 2.5 14B Q4 short-context, which is the honest sweet spot for this card.

$980–$1,300EstimatedRead →
PERFECT · 7.9×16 GB

NVIDIA RTX 5080

Blackwell architecture + GDDR7 at 960 GB/s buys you ~30–40% more tok/s than the 5060 Ti 16 GB, but the VRAM ceiling is identical. If your work lives in the 8B–14B dense band, this is the honest Blackwell pick; if you need 30B-A3B MoE with headroom, you need more memory.

$999–$1,400EstimatedRead →
PERFECT · 7.9×24 GB unified

MacBook Air M5 24 GB

The M5 Air at 24 GB is the first Apple laptop where 8B dense Q4 inference feels responsive without a fan ramping up — because there is no fan. 153 GB/s bandwidth is the honest limiting factor; this is not a 14B-comfortable machine.

$1,299–$1,699EstimatedRead →
PERFECT · 7.9×24 GB unified

Mac Mini M4 Pro 24 GB

At 273 GB/s — 2.3× the base Mac mini M4's bandwidth — the M4 Pro in its base 12-core CPU / 16-core GPU bin is the first Apple silicon SKU where 14B dense Q4 feels responsive, not ponderous. Silent, 4 W idle, $1,399 from Apple.

$1,399EstimatedRead →
PERFECT · 23.6×48 GB

Dual RTX 3090 (used)

Two used 3090s give you 48 GB of VRAM for roughly $1,600 all-in — enough for 70B dense at Q4 with room for context. llama.cpp and Ollama split across PCIe automatically; no NVLink needed. The compromise is noise, heat, and finding honest used cards.

$1,800–$2,500 all-inEstimatedRead →
PERFECT · 42.1×128 GB unified

Framework Desktop (Ryzen AI Max+ 395)

Strix Halo's 40-CU Radeon 8060S iGPU plus 128 GB LPDDR5X unified memory runs Qwen 3 30B-A3B MoE at ~72 tok/s — 4× the bandwidth of the Minisforum UM890 Pro, 4× the memory. A genuine local-AI mini-PC, not a CPU box that happens to boot.

$1,999–$2,851EstimatedRead →
PERFECT · 11.8×24 GB

NVIDIA RTX 4090

Same 24 GB VRAM ceiling as the new generation's sweet spot, 1 TB/s bandwidth, mature CUDA stack, no 12VHPWR drama if you buy a unit with the updated 12V-2x6 connector. Buy used from a trusted seller — new retail at scalper prices is not the right move.

$2,200–$2,800MeasuredRead →
PERFECT · 15.8×48 GB unified

M5 Pro MacBook Pro 48 GB

48 GB unified at 307 GB/s — 44% more bandwidth than M4 Pro, enough to run Qwen 3.5 35B-A3B MoE at 70–90 tok/s on battery, in a laptop. The honest step-up from the Mac mini M4 Pro 24 GB without going to the $4,499 M5 Max 64 GB.

$2,599–$3,099EstimatedRead →
PERFECT · 15.7×32 GB

NVIDIA RTX 5090

A 32 GB Blackwell card that runs every modern coding, chat, and agent model at Q4 with headroom, at speeds a used dual-3090 rig can match only with a power bill and a compromise. AIB allocation has thawed enough that the entry floor came back down to ~$2,910 in late May.

$2,910–$4,300MeasuredRead →
PERFECT · 21.0×64 GB unified

Mac Studio M4 Max 64 GB

64 GB unified memory at 546 GB/s. Runs 30B-A3B MoE at 70–100 tok/s silently at 6 W idle, Qwen 3.5 27B dense at ~20 tok/s, and FLUX.2 klein pipelines cleanly. 70B dense Q4 fits with a `sudo sysctl iogpu.wired_limit_mb` tweak at 8–15 tok/s — workable, not silent under sustained load. Previous-gen M4 now, and Bloomberg (April 19, 2026) reported the M5 Mac Studio refresh slipped to October 2026 — supply chain. Buy-now case is stronger than it was a week ago.

$3,199EstimatedRead →
PERFECT · 23.6×48 GB ECC

NVIDIA RTX A6000 (48 GB, used)

The only consumer-reachable single-card path to 48 GB VRAM under $5,000. Ampere-generation workstation silicon with ECC memory, 768 GB/s bandwidth, and a dual-slot blower that tolerates sustained load. Used-market prices make it an Ada-tax-avoidance play.

$3,500–$4,500EstimatedRead →
PERFECT · 31.6×96 GB unified

Mac Studio M3 Ultra 96 GB

819 GB/s unified memory bandwidth — the highest in any shipping Mac — plus 96 GB capacity puts Llama 3.3 70B Q4 at 12–18 tok/s on a box that runs at ~70 W idle and fits on a bookshelf. Dual M3 Max dies under one heatsink, no GPU tower, no fan noise.

$3,999EstimatedRead →
PERFECT · 21.0×64 GB unified

M5 Max MacBook Pro 64 GB

64 GB of unified memory at 614 GB/s on the 40-core GPU M5 Max. Runs every modern model up to 35B-A3B MoE at reasonable speed, in a silent chassis that sustains load on battery. The compromise: prefill on long prompts is noticeably slower than NVIDIA, and you pay Apple's storage tax to go beyond 48 GB.

$4,499MeasuredRead →
PERFECT · 42.1×128 GB unified

NVIDIA DGX Spark

128 GB of unified memory via NVIDIA's GB10 Grace Blackwell Superchip — 4× what any consumer GPU gives you. The catch: 273 GB/s bandwidth is ~27% of an RTX 4090, so you trade raw speed for fit. A capacity-first machine, not a speed-first machine.

$4,699MeasuredRead →
PERFECT · 31.4×64 GB (2×32)

Dual RTX 5090

Two RTX 5090s with 2× 1,792 GB/s bandwidth and 64 GB total VRAM. This is the first consumer configuration that fits 122B-A10B MoE with room AND generates tokens fast enough to use interactively. The tradeoff is 1,500 W sustained draw, dual 12VHPWR connectors, and a case that fits 9-slot cards side-by-side.

$8,500–$10,500EstimatedRead →

Tier slots this model fills

DOCS · MID
BGE-M3 (retrieval)Community-standard dense + sparse + multi-vector embeddings; multilingual; pairs with any generator.