Dual RTX 5090

The only consumer rig that runs 122B-A10B frontier MoE at interactive speed.

Two RTX 5090s with 2× 1,792 GB/s bandwidth and 64 GB total VRAM. This is the first consumer configuration that fits 122B-A10B MoE with room AND generates tokens fast enough to use interactively. The tradeoff is 1,500 W sustained draw, dual 12VHPWR connectors, and a case that fits 9-slot cards side-by-side.

The decision in five lines

The call: Buy — The only consumer rig that runs 122B-A10B frontier MoE at interactive speed.
Best for: Frontier consumer
Runs well: Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.6-27B · Z-Image-Turbo (Apache 2.0)
Watch out: 2× 12VHPWR connectors means 2× melting-risk surface. Use native dual-cable 12V-2x6 PSU (1,600 W Platinum ATX 3.1). Never adapters, never daisy-chained.
Evidence: Estimated · last verified July 2026

64: GB GDDR7 (2×32)
3,584: GB/S AGGREGATE
1,150: W GPU TDP
~$8,500: ALL-IN (BUILT)

What fits at this tier

64 GB combined fits 70B dense Q4 (~40 GB) with room, 122B-A10B MoE at 4-bit (~70 GB weights) in split mode, and 35B-A3B MoE singly on either card at 170–197 tok/s. Llama 3.3 70B Q4 runs at ~26–27 tok/s via Ollama tensor-parallel (Databasemart); vLLM batched can push 40–50+ tok/s.

CODING

Qwen3-Coder-30B-A3B (MoE, fits 24GB) Community daily driver for local coding; 3B-active MoE delivers 30B quality at 3B-dense speed.

CHAT / GENERAL

Qwen 3.6-27B April 22 2026 dense refresh; supersedes Qwen 3.5 27B and claims to beat the prior 397B MoE flagship while staying single-GPU at Q4 (~17 GB).

DOCS & RETRIEVAL

Qwen 3.6-27B April 22 2026 dense refresh — 262K native context extensible to 1M, multimodal, single-GPU at Q4. Now the dense long-context top pick.

IMAGE

Z-Image-Turbo (Apache 2.0) Community daily driver for realism; 6B, 8-step inference, Apache 2.0 — commercial OK.

AGENTS

Qwen 3.6-35B-A3B Latest Qwen MoE; strong function calling; realistic on 24GB+ VRAM or Mac 48GB+ — the local agentic top pick.

VOICE

Qwen3-Omni-30B-A3B-Instruct Apache 2.0 MoE; audio+video+image+text in, speech+text out; 17GB at Q4. Frontier unified voice.

The call

Buy it if you're running 122B-A10B frontier models or batched vLLM workloads and you need both capacity and bandwidth. The only consumer rig that does both.
Skip it if your workload is under 70B Q4 — a single RTX 5090 is 60% of the performance at 35% of the total cost. Also skip if you want quiet — 1,500 W in a single case is real heat and real fan noise.

Watchouts

2× 12VHPWR connectors means 2× melting-risk surface. Use native dual-cable 12V-2x6 PSU (1,600 W Platinum ATX 3.1). Never adapters, never daisy-chained.
1,400 W sustained in a single case produces real heat (~45 dBA at desk, 5 °C above single-card ambient). Not a bedroom rig. Not a closet rig.
Consumer AM5/X870E boards split PCIe 5.0 as x8/x8 — fine for inference, suboptimal for training. Full x16/x16 requires TRX50 Threadripper (+$2,000).
llama.cpp split-mode layer has known bugs on non-P2P PCIe topologies (issue #20052 — split-mode produces garbage at context >2048 on non-P2P boards). Check runner version + topology before buying.

Local vs cloud at this tier

● LOCAL WINS

122B-A10B frontier MoE at interactive speed — unique in the consumer lineup. Batched serving (vLLM, TensorRT-LLM) for multi-user home inference. Frontier-open-weight capability without a data-center GPU partition.

● CLOUD WINS

Cloud wins at much of the real frontier (Opus 4.8, GPT-5.4). The $8,500 build pays for itself vs $200/mo Claude Max 20x in ~42 months — a long horizon for hardware that's a full platform generation from obsolescence.

The honest top-tier consumer local-AI rig in July 2026. Worth it only if your workload genuinely needs 122B-A10B or batched serving — for everything else, a single 5090 or M3 Ultra Studio is the saner pick.

Next step

Load this setup into the planner→