GUIDE · COMPARISON · MAY 2026

RTX 5090 vs RTX 4090 — used 4090 still wins on $/perf for most local-AI buyers.

The 5090’s paper specs are real. So is the 30–67% speed gain on bandwidth-bound workloads. But at ~$3,000 new vs ~$1,800 used, the 4090 is the sharper buy for everyone except four specific cases — and the 12V-2x6 connector risk still hasn’t been fixed at the ecosystem level.

Honest verdict, May 2026. Numbers cross-verified from LocalScore, Hardware Corner, llama-bench community runs, and BestValueGPU price tracking.

The decision in five lines

The call: Buy used 4090 — unless you hit one of the four 5090 wins below.
5090 wins: 32B dense Q4 fits cleanly · 32K+ context prefill ~30% faster · FLUX.2 image gen 30–40% faster · FP4 + bandwidth compound under batched/concurrent inference.
4090 wins: 8B–14B dense daily driver workloads · MoE 30B-A3B at short context · $/perf overall · no 12V-2x6 connector premium baked in.
Price delta: ~$1,200–$2,000. Used 4090 ~$1,800–$2,200 (eBay) vs new 5090 ~$2,900–$3,900 (AIB), or $1,999 FE if you can catch a restock.
Evidence: Estimated · cross-verified May 2026

Specs side-by-side

VRAM

4090: 24 GB GDDR6X, 384-bit, 1,008 GB/s · 5090: 32 GB GDDR7, 512-bit, 1,792 GB/s (+78%)

Compute

4090: 16,384 CUDA, no FP4, 450 W · 5090: 21,760 CUDA (+33%), FP4 (3.4 PFLOPS dense), 575 W

Street price (May 2026)

4090: ~$1,800–$2,400 used / ~$2,500–$2,755 new (scalper) · 5090: ~$2,900–$3,900 AIB; FE at $1,999 sells out in 3–5 min

LLM throughput — where the gap shows up

Llama 3.1 8B Q4_K_M: 4090 hits ~120 tok/s gen (LocalScore); 5090 lands 180–213 tok/s — +50–67%. This is the Blackwell bandwidth gain delivering as predicted.

Qwen 30B/35B-A3B MoE Q4: 4090 ~196 tok/s TG; 5090 200+ tok/s — only ~5–15% faster on short-context generation because A3B is compute-light. Where the 5090 actually pulls ahead: prefill at 32K context hits ~793 tok/s, where the 4090 sees meaningful slowdown. If your workflow lives at long context, this matters.

32B dense Q4 (~19 GB): fits cleanly on 5090 with context headroom (~61 tok/s); the 4090 can’t fit it with reasonable context. This is the clearest 5090-only win in the consumer-LLM space — Qwen 3.5 27B and Qwen 3.6-27B both live here.

70B Q4 (~40 GB): Neither card fits it. 4090 only runs IQ2/IQ3 (quality-compromised); 5090 fits Q3_K_M (~31 GB) but with no context headroom. If you’re buying for 70B Q4 work, you need dual 3090s, an A6000, or multi-GPU. Single 5090 is not the answer.

Image gen — where the 5090 actually shines

FLUX.2 dev FP8 at 1024² lands in ~6.2–7 s on the 5090 vs ~10 s on the 4090 in ComfyUI — a real 30–40% wall-clock saving. SDXL gap runs closer to 52% in the 5090’s favor. If you’re generating high volumes of images, this compounds quickly. For occasional image gen alongside primarily-LLM work, the 4090 is still more-than-fine.

The 12V-2x6 connector reality

Not fixed at the ecosystem level. 12V-2x6 is itself the revised connector spec; the residual risk is implementation — cable seating, PSU/cable quality, the higher current draw, and how unforgiving the system is to user error. Melted-connector reports continue through 2026, and the 5090 is materially worse exposed than the 4090 because TGP jumped from 450 W to 575 W. Vendors are mitigating with per-pin sensing (ASUS Astral, MSI’s yellow-tipped "fuse" cable) rather than the broader cable/PSU ecosystem self-correcting. If you buy a 5090, plan on a 1000 W+ PSU and an ASUS or MSI sensing card — that’s another ~$200–$400 of premium nobody prices into the comparison.

Driver maturity — May 2026 update

Blackwell driver issues from launch (Jan 2026) are largely resolved in mature stacks (llama.cpp, vLLM) by Q2 2026. Token-gen scaling now matches bandwidth-predicted gains; the early “Blackwell underperforms” reports closed out. Avoid driver 572.16 specifically. Otherwise the 5090 today behaves the way the spec sheet promises — the question is whether it’s worth the premium, not whether it works.

When the 5090 IS the right buy

You’re memory-bound on 32B dense Q4 (5090 fits with context; 4090 doesn’t).
You push 32K+ context daily on MoE models (prefill gap is ~30%).
You run batched or concurrent inference where FP4 + bandwidth compound.
You generate high volumes of images (FLUX.2 / SDXL gap is real wall-clock).

For 8B–14B dense daily-driver work and casual MoE on short contexts, a used 4090 at $1,800 delivers ~70–80% of 5090 perf at ~50–60% of cost. Both cards fail at single-card 70B Q4 — that’s a multi-GPU story regardless.

Read the full hardware verdicts

RTX 4090 — full editorial verdict→RTX 5090 — full editorial verdict→