GUIDE · COMPARISON · MAY 2026
RTX 5090 vs RTX 4090 — used 4090 still wins on $/perf for most local-AI buyers.
The 5090’s paper specs are real. So is the 30–67% speed gain on bandwidth-bound workloads. But at ~$3,000 new vs ~$1,800 used, the 4090 is the sharper buy for everyone except four specific cases — and the 12V-2x6 connector risk still hasn’t been fixed at the ecosystem level.
Honest verdict, May 2026. Numbers cross-verified from LocalScore, Hardware Corner, llama-bench community runs, and BestValueGPU price tracking.
The decision in five lines
- The call
- Buy used 4090 — unless you hit one of the four 5090 wins below.
- 5090 wins
- 32B dense Q4 fits cleanly · 32K+ context prefill ~30% faster · FLUX.2 image gen 30–40% faster · FP4 + bandwidth compound under batched/concurrent inference.
- 4090 wins
- 8B–14B dense daily driver workloads · MoE 30B-A3B at short context · $/perf overall · no 12V-2x6 connector premium baked in.
- Price delta
- ~$1,200–$2,000. Used 4090 ~$1,800–$2,200 (eBay) vs new 5090 ~$2,900–$3,900 (AIB), or $1,999 FE if you can catch a restock.
- Evidence
- Estimated
Specs side-by-side
VRAM
Compute
Street price (May 2026)
LLM throughput — where the gap shows up
Llama 3.1 8B Q4_K_M: 4090 hits ~120 tok/s gen (LocalScore); 5090 lands 180–213 tok/s — +50–67%. This is the Blackwell bandwidth gain delivering as predicted.
Qwen 30B/35B-A3B MoE Q4: 4090 ~196 tok/s TG; 5090 200+ tok/s — only ~5–15% faster on short-context generation because A3B is compute-light. Where the 5090 actually pulls ahead: prefill at 32K context hits ~793 tok/s, where the 4090 sees meaningful slowdown. If your workflow lives at long context, this matters.
32B dense Q4 (~19 GB): fits cleanly on 5090 with context headroom (~61 tok/s); the 4090 can’t fit it with reasonable context. This is the clearest 5090-only win in the consumer-LLM space — Qwen 3.5 27B and Qwen 3.6-27B both live here.
70B Q4 (~40 GB): Neither card fits it. 4090 only runs IQ2/IQ3 (quality-compromised); 5090 fits Q3_K_M (~31 GB) but with no context headroom. If you’re buying for 70B Q4 work, you need dual 3090s, an A6000, or multi-GPU. Single 5090 is not the answer.
Image gen — where the 5090 actually shines
FLUX.2 dev FP8 at 1024² lands in ~6.2–7 s on the 5090 vs ~10 s on the 4090 in ComfyUI — a real 30–40% wall-clock saving. SDXL gap runs closer to 52% in the 5090’s favor. If you’re generating high volumes of images, this compounds quickly. For occasional image gen alongside primarily-LLM work, the 4090 is still more-than-fine.
The 12V-2x6 connector reality
Not fixed at the ecosystem level. 12V-2x6 is itself the revised connector spec; the residual risk is implementation — cable seating, PSU/cable quality, the higher current draw, and how unforgiving the system is to user error. Melted-connector reports continue through 2026, and the 5090 is materially worse exposed than the 4090 because TGP jumped from 450 W to 575 W. Vendors are mitigating with per-pin sensing (ASUS Astral, MSI’s yellow-tipped "fuse" cable) rather than the broader cable/PSU ecosystem self-correcting. If you buy a 5090, plan on a 1000 W+ PSU and an ASUS or MSI sensing card — that’s another ~$200–$400 of premium nobody prices into the comparison.
Driver maturity — May 2026 update
Blackwell driver issues from launch (Jan 2026) are largely resolved in mature stacks (llama.cpp, vLLM) by Q2 2026. Token-gen scaling now matches bandwidth-predicted gains; the early “Blackwell underperforms” reports closed out. Avoid driver 572.16 specifically. Otherwise the 5090 today behaves the way the spec sheet promises — the question is whether it’s worth the premium, not whether it works.
When the 5090 IS the right buy
- You’re memory-bound on 32B dense Q4 (5090 fits with context; 4090 doesn’t).
- You push 32K+ context daily on MoE models (prefill gap is ~30%).
- You run batched or concurrent inference where FP4 + bandwidth compound.
- You generate high volumes of images (FLUX.2 / SDXL gap is real wall-clock).
For 8B–14B dense daily-driver work and casual MoE on short contexts, a used 4090 at $1,800 delivers ~70–80% of 5090 perf at ~50–60% of cost. Both cards fail at single-card 70B Q4 — that’s a multi-GPU story regardless.
Read the full hardware verdicts
RTX 4090 — full editorial verdict→RTX 5090 — full editorial verdict→