HARDWARE · FRONTIER CONSUMER · 64 GB (2×32)
Dual RTX 5090
The only consumer rig that runs 122B-A10B frontier MoE at interactive speed.
Two RTX 5090s with 2× 1,792 GB/s bandwidth and 64 GB total VRAM. This is the first consumer configuration that fits 122B-A10B MoE with room AND generates tokens fast enough to use interactively. The tradeoff is 1,500 W sustained draw, dual 12VHPWR connectors, and a case that fits 9-slot cards side-by-side.
The decision in five lines
- The call
- Buy — The only consumer rig that runs 122B-A10B frontier MoE at interactive speed.
- Best for
- Frontier consumer
- Runs well
- Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.6-27B · Z-Image-Turbo (Apache 2.0)
- Watch out
- 2× 12VHPWR connectors means 2× melting-risk surface. Use native dual-cable 12V-2x6 PSU (1,600 W Platinum ATX 3.1). Never adapters, never daisy-chained.
- Evidence
- Estimated
- 64
- GB GDDR7 (2×32)
- 3,584
- GB/S AGGREGATE
- 1,150
- W GPU TDP
- ~$8,500
- ALL-IN (BUILT)
What fits at this tier
64 GB combined fits 70B dense Q4 (~40 GB) with room, 122B-A10B MoE at 4-bit (~70 GB weights) in split mode, and 35B-A3B MoE singly on either card at 170–197 tok/s. Llama 3.3 70B Q4 runs at ~26–27 tok/s via Ollama tensor-parallel (Databasemart); vLLM batched can push 40–50+ tok/s.
The call
Buy it if you're running 122B-A10B frontier models or batched vLLM workloads and you need both capacity and bandwidth. The only consumer rig that does both.
Skip it if your workload is under 70B Q4 — a single RTX 5090 is 60% of the performance at 35% of the total cost. Also skip if you want quiet — 1,500 W in a single case is real heat and real fan noise.
Watchouts
- 2× 12VHPWR connectors means 2× melting-risk surface. Use native dual-cable 12V-2x6 PSU (1,600 W Platinum ATX 3.1). Never adapters, never daisy-chained.
- 1,400 W sustained in a single case produces real heat (~45 dBA at desk, 5 °C above single-card ambient). Not a bedroom rig. Not a closet rig.
- Consumer AM5/X870E boards split PCIe 5.0 as x8/x8 — fine for inference, suboptimal for training. Full x16/x16 requires TRX50 Threadripper (+$2,000).
- llama.cpp split-mode layer has known bugs on non-P2P PCIe topologies (issue #20052 — split-mode produces garbage at context >2048 on non-P2P boards). Check runner version + topology before buying.
Local vs cloud at this tier
● LOCAL WINS
122B-A10B frontier MoE at interactive speed — unique in the consumer lineup. Batched serving (vLLM, TensorRT-LLM) for multi-user home inference. Frontier-open-weight capability without a data-center GPU partition.
● CLOUD WINS
Cloud wins at much of the real frontier (Opus 4.8, GPT-5.4). The $8,500 build pays for itself vs $200/mo Claude Max 20x in ~42 months — a long horizon for hardware that's a full platform generation from obsolescence.
The honest top-tier consumer local-AI rig in June 2026. Worth it only if your workload genuinely needs 122B-A10B or batched serving — for everything else, a single 5090 or M3 Ultra Studio is the saner pick.
Next step
Load this setup into the planner→