NVIDIA RTX 5090

The single-GPU answer for people who refuse to wait.

A 32 GB Blackwell card that runs every modern coding, chat, and agent model at Q4 with headroom, at speeds a used dual-3090 rig can match only with a power bill and a compromise. The brief late-May AIB thaw that pulled the floor to ~$2,910 has reversed — by June the cheapest realistic entry is ~$3,500 used, with new cards at ~$3,800–$4,200.

The decision in five lines

The call: Buy — The single-GPU answer for people who refuse to wait.
Best for: Top tier
Runs well: Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.6-27B · Z-Image-Turbo (Apache 2.0)
Watch out: Street price still sits well above $1,999 MSRP and the late-May thaw reversed. Floor moved $3,000 (April) → $3,799 (early May) → $2,910–$4,300 (May 28, ASUS TUF AIBs at $2,909.99) → $3,500–$4,300 (June: the sub-$3K AIB allocation dried up, used eBay ~$3,500, Amazon new ~$4,199, Zotac Solid OC ~$3,800). DRAM shortage is still squeezing supply industry-wide; IDC pegs it through Q4 2027, and a rumored ~$300 GDDR7-driven MSRP hike has not yet landed.
Evidence: Measured · last verified July 2026

32: GB GDDR7
1,792: GB/S BANDWIDTH
575: W TDP
~$3,500: FROM (STREET)

What fits at this tier

Unlocks the top band across every use case. MoE 30B-A3B and 35B-A3B at Q4 with room for 32K+ context; 32B dense Q4 fits with headroom; FLUX.2 dev fits without quant pain. 70B dense Q4 (~40 GB) does NOT fit 32 GB — needs 48 GB+ (dual-3090 or dual-5090).

CODING

Qwen3-Coder-30B-A3B (MoE, fits 24GB) Community daily driver for local coding; 3B-active MoE delivers 30B quality at 3B-dense speed.

CHAT / GENERAL

Qwen 3.6-27B April 22 2026 dense refresh; supersedes Qwen 3.5 27B and claims to beat the prior 397B MoE flagship while staying single-GPU at Q4 (~17 GB).

DOCS & RETRIEVAL

Qwen 3.6-27B April 22 2026 dense refresh — 262K native context extensible to 1M, multimodal, single-GPU at Q4. Now the dense long-context top pick.

IMAGE

Z-Image-Turbo (Apache 2.0) Community daily driver for realism; 6B, 8-step inference, Apache 2.0 — commercial OK.

AGENTS

Qwen 3.6-35B-A3B Latest Qwen MoE; strong function calling; realistic on 24GB+ VRAM or Mac 48GB+ — the local agentic top pick.

VOICE

Qwen3-Omni-30B-A3B-Instruct Apache 2.0 MoE; audio+video+image+text in, speech+text out; 17GB at Q4. Frontier unified voice.

The call

Buy it if you want one card that runs every modern MoE at Q4 without thinking about quant or context.
Skip it if you can stomach used hardware — a pair of RTX 3090s gives you 48 GB for roughly $1,600 all-in, at the cost of noise, heat, and PCIe splitting.

Watchouts

Street price still sits well above $1,999 MSRP and the late-May thaw reversed. Floor moved $3,000 (April) → $3,799 (early May) → $2,910–$4,300 (May 28, ASUS TUF AIBs at $2,909.99) → $3,500–$4,300 (June: the sub-$3K AIB allocation dried up, used eBay ~$3,500, Amazon new ~$4,199, Zotac Solid OC ~$3,800). DRAM shortage is still squeezing supply industry-wide; IDC pegs it through Q4 2027, and a rumored ~$300 GDDR7-driven MSRP hike has not yet landed.
575 W TDP needs a 1,000 W+ PSU and honest cooling. Check case airflow and PSU rail headroom before buying.
12VHPWR connector — follow Nvidia's re-seat guidance. Melted-cable stories are rare but real at this power draw.
FE cards run hot under sustained load. AIB partner cards (ASUS, MSI, Gigabyte) with larger heatsinks are worth the $100–$200 premium if you'll keep it loaded for hours.

Local vs cloud at this tier

● LOCAL WINS

Unrestricted context, fully offline, no per-token cost for heavy daily use (>50M tok/mo), and you own the hardware for 3–5 years.

● CLOUD WINS

Frontier reasoning (Claude Opus 4.8, GPT-5.4), first-day access to new models, zero upfront cost, and no electricity or cooling to think about.

At this tier, the break-even vs a $100/mo ChatGPT Pro plan is ~24–30 months at regular usage. Heavy API users break even in under a year.

Next step

Load this setup into the planner→