the AI bench
VERIFIED JUNE 2026
All hardware

HARDWARE · PREVIOUS GEN · 24 GB

NVIDIA RTX 4090

The 24 GB flagship that aged into a smart-money pick.

Same 24 GB VRAM ceiling as the new generation's sweet spot, 1 TB/s bandwidth, mature CUDA stack, no 12VHPWR drama if you buy a unit with the updated 12V-2x6 connector. Buy used from a trusted seller — new retail at scalper prices is not the right move.

The decision in five lines

The call
Buy — The 24 GB flagship that aged into a smart-money pick.
Best for
Previous gen
Runs well
Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.6-27B · Z-Image-Turbo (Apache 2.0)
Watch out
Used-market floor lifted ~$1,400 since early 2026. Cheap eBay 4090s ($1,000) are mostly gone; mining-origin cards or units sold for parts. Insist on original box, serial lookup, and ideally in-person testing.
Evidence
Measured · last verified June 2026

24
GB GDDR6X
1,008
GB/S BANDWIDTH
450
W TDP
~$2,374
USED (EBAY)

What fits at this tier

Runs what fits in 24 GB — which is most of what matters. MoE 30B-A3B at Q4, 14B dense with room for 64K+ context, 32B dense tight at Q4. 70B only fits at IQ2 (2-bit, quality-compromised) — true 70B Q4 (~40 GB) needs 48 GB+ (dual-3090 or dual-5090).

CODING
Qwen3-Coder-30B-A3B (MoE, fits 24GB) Community daily driver for local coding; 3B-active MoE delivers 30B quality at 3B-dense speed.
CHAT / GENERAL
Qwen 3.6-27B April 22 2026 dense refresh; supersedes Qwen 3.5 27B and claims to beat the prior 397B MoE flagship while staying single-GPU at Q4 (~17 GB).
DOCS & RETRIEVAL
Qwen 3.6-27B April 22 2026 dense refresh — 262K native context extensible to 1M, multimodal, single-GPU at Q4. Now the dense long-context top pick.
IMAGE
Z-Image-Turbo (Apache 2.0) Community daily driver for realism; 6B, 8-step inference, Apache 2.0 — commercial OK.
AGENTS
Qwen 3.6-35B-A3B Latest Qwen MoE; strong function calling; realistic on 24GB+ VRAM or Mac 48GB+ — the local agentic top pick.
VOICE
Qwen3-Omni-30B-A3B-Instruct Apache 2.0 MoE; audio+video+image+text in, speech+text out; 17GB at Q4. Frontier unified voice.

The call

Buy it used at $2,400 or under from a trusted seller (original box, serial lookup, in-person test). The used market lifted to ~$2,374 eBay average as RTX 5090 scarcity pushed buyers down a tier — but used 4090 still beats new 5090 ($3,800+) for buyers who don't need the extra 8 GB.

Skip if you can stretch to a used dual-3090 rig (~$1,800 all-in for 48 GB) or a Mac Studio M4 Max 64 GB ($3,199) — both give more VRAM at similar or lower cost.

Watchouts

  • Used-market floor lifted ~$1,400 since early 2026. Cheap eBay 4090s ($1,000) are mostly gone; mining-origin cards or units sold for parts. Insist on original box, serial lookup, and ideally in-person testing.
  • Needs 850 W+ PSU with a 12VHPWR adapter. First-gen connector had melting reports — use the updated 12V-2x6 variant or a native ATX 3.1 PSU.
  • Physical size: 3–3.5 slot, 304–336 mm long. Many mid-tower cases won't fit without removing drive cages.
  • New retail at Amazon hits $2,755 — within striking distance of a used unit. If you can't verify a used seller, paying the new premium for warranty is defensible.

Local vs cloud at this tier

● LOCAL WINS

24 GB at used prices is the best $/VRAM entry into the serious tier. Runs 30B-A3B MoE at Q4, 14B dense at ~70 tok/s with 16K context, and most modern models that matter. 70B is a dual-card story. ~60–80 tok/s on 8B dense.

● CLOUD WINS

Frontier-scale models (GLM-5.1 754B, Claude Opus 4.8) are out of reach at 24 GB. Cloud wins on raw capability ceiling.

At ~$1,700 used + ~$25/mo electricity, break-even vs a $100/mo ChatGPT Pro plan is ~16–18 months. Heavy API-tier users break even in under a year.

Next step

Load this setup into the planner