the AI bench
VERIFIED JUNE 2026
All hardware

HARDWARE · FRONTIER CONSUMER · 64 GB (2×32)

Dual RTX 5090

The only consumer rig that runs 122B-A10B frontier MoE at interactive speed.

Two RTX 5090s with 2× 1,792 GB/s bandwidth and 64 GB total VRAM. This is the first consumer configuration that fits 122B-A10B MoE with room AND generates tokens fast enough to use interactively. The tradeoff is 1,500 W sustained draw, dual 12VHPWR connectors, and a case that fits 9-slot cards side-by-side.

The decision in five lines

The call
Buy — The only consumer rig that runs 122B-A10B frontier MoE at interactive speed.
Best for
Frontier consumer
Runs well
Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.6-27B · Z-Image-Turbo (Apache 2.0)
Watch out
2× 12VHPWR connectors means 2× melting-risk surface. Use native dual-cable 12V-2x6 PSU (1,600 W Platinum ATX 3.1). Never adapters, never daisy-chained.
Evidence
Estimated · last verified June 2026

64
GB GDDR7 (2×32)
3,584
GB/S AGGREGATE
1,150
W GPU TDP
~$8,500
ALL-IN (BUILT)

What fits at this tier

64 GB combined fits 70B dense Q4 (~40 GB) with room, 122B-A10B MoE at 4-bit (~70 GB weights) in split mode, and 35B-A3B MoE singly on either card at 170–197 tok/s. Llama 3.3 70B Q4 runs at ~26–27 tok/s via Ollama tensor-parallel (Databasemart); vLLM batched can push 40–50+ tok/s.

CODING
Qwen3-Coder-30B-A3B (MoE, fits 24GB) Community daily driver for local coding; 3B-active MoE delivers 30B quality at 3B-dense speed.
CHAT / GENERAL
Qwen 3.6-27B April 22 2026 dense refresh; supersedes Qwen 3.5 27B and claims to beat the prior 397B MoE flagship while staying single-GPU at Q4 (~17 GB).
DOCS & RETRIEVAL
Qwen 3.6-27B April 22 2026 dense refresh — 262K native context extensible to 1M, multimodal, single-GPU at Q4. Now the dense long-context top pick.
IMAGE
Z-Image-Turbo (Apache 2.0) Community daily driver for realism; 6B, 8-step inference, Apache 2.0 — commercial OK.
AGENTS
Qwen 3.6-35B-A3B Latest Qwen MoE; strong function calling; realistic on 24GB+ VRAM or Mac 48GB+ — the local agentic top pick.
VOICE
Qwen3-Omni-30B-A3B-Instruct Apache 2.0 MoE; audio+video+image+text in, speech+text out; 17GB at Q4. Frontier unified voice.

The call

Buy it if you're running 122B-A10B frontier models or batched vLLM workloads and you need both capacity and bandwidth. The only consumer rig that does both.

Skip it if your workload is under 70B Q4 — a single RTX 5090 is 60% of the performance at 35% of the total cost. Also skip if you want quiet — 1,500 W in a single case is real heat and real fan noise.

Watchouts

  • 2× 12VHPWR connectors means 2× melting-risk surface. Use native dual-cable 12V-2x6 PSU (1,600 W Platinum ATX 3.1). Never adapters, never daisy-chained.
  • 1,400 W sustained in a single case produces real heat (~45 dBA at desk, 5 °C above single-card ambient). Not a bedroom rig. Not a closet rig.
  • Consumer AM5/X870E boards split PCIe 5.0 as x8/x8 — fine for inference, suboptimal for training. Full x16/x16 requires TRX50 Threadripper (+$2,000).
  • llama.cpp split-mode layer has known bugs on non-P2P PCIe topologies (issue #20052 — split-mode produces garbage at context >2048 on non-P2P boards). Check runner version + topology before buying.

Local vs cloud at this tier

● LOCAL WINS

122B-A10B frontier MoE at interactive speed — unique in the consumer lineup. Batched serving (vLLM, TensorRT-LLM) for multi-user home inference. Frontier-open-weight capability without a data-center GPU partition.

● CLOUD WINS

Cloud wins at much of the real frontier (Opus 4.8, GPT-5.4). The $8,500 build pays for itself vs $200/mo Claude Max 20x in ~42 months — a long horizon for hardware that's a full platform generation from obsolescence.

The honest top-tier consumer local-AI rig in June 2026. Worth it only if your workload genuinely needs 122B-A10B or batched serving — for everything else, a single 5090 or M3 Ultra Studio is the saner pick.

Next step

Load this setup into the planner