the AI bench
VERIFIED JUNE 2026
All hardware

HARDWARE · TEAM RED · 24 GB

AMD Radeon RX 7900 XTX

The AMD answer, with an honest asterisk on driver friction.

24 GB at roughly 85–90% of a 4090's throughput under ROCm. The hardware is fine; the software ecosystem is the tax. Plan 5–10 hours on first-time ROCm setup, plus the ongoing friction of Ollama being patchy on AMD. New-market pricing has split sharply from used since the DRAM crunch — used 3090s and used 7900 XTXs are now the same $760 band.

The decision in five lines

The call
Buy — The AMD answer, with an honest asterisk on driver friction.
Best for
Team red
Runs well
Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.5 35B-A3B (MoE, fits 24GB) · Gemma 4 31B (256K context)
Watch out
AMD post-EOL'd the 7900 XTX and supply has tightened sharply through June 2026. New retail has spiked — Amazon shows $2,374 outliers — while used eBay floor sits ~$770. Plan to source used or wait for RDNA4 inventory.
Evidence
Measured · last verified June 2026

24
GB GDDR6
960
GB/S BANDWIDTH
355
W TDP
~$770
USED / NEW

What fits at this tier

Runs 24 GB-tier models at Q4 cleanly — MoE 30B-A3B, 14B dense, gpt-oss-20b — via vLLM or llama.cpp (HIP). Some MoE picks had HIP kernel issues through 2025; check llama.cpp release notes before pulling the latest.

CODING
Qwen3-Coder-30B-A3B (MoE, fits 24GB) 3B-active MoE — benchmark champion for local coding at this tier.
CHAT / GENERAL
Qwen 3.5 35B-A3B (MoE, fits 24GB) 3B active MoE — 30B quality at 3B inference speed.
DOCS & RETRIEVAL
Gemma 4 31B (256K context) 31B dense with 256K context; Gemma commercial-permissive terms; Arena top 5.
IMAGE
HiDream-O1-Image (8B, MIT) May 8, 2026 release. Pixel-space (no VAE, no disjoint text encoder) — debuted top-10 on Artificial Analysis T2I Arena. MIT-licensed 8B; one model handles T2I + edit + subject-driven personalization at up to 2,048².
AGENTS
Qwen 3.5 35B-A3B (MoE, fits 24GB) MoE with native tool use; fits 24GB at Q4; Apache 2.0.
VOICE
VoxCPM2 (2B, Apache 2.0) 30 languages, 48 kHz, tokenizer-free diffusion AR; voice design from text. April 2026 release.

The call

Buy it if you already run Linux, want 24 GB without paying NVIDIA's supply-crunch tax, and treat driver tinkering as acceptable friction rather than pain.

Skip it if your time is worth more than ~$50/hr — you will spend a day on ROCm setup and another day later the first time a new model doesn't JIT cleanly. Also skip if scarcity premiums make the RX 9070 XT (RDNA4, 16 GB at $649–$779) the cleaner AMD entry — the 9070 XT trades 8 GB of headroom for fresh architecture support and ROCm 7+ first-class WMMA.

Watchouts

  • AMD post-EOL'd the 7900 XTX and supply has tightened sharply through June 2026. New retail has spiked — Amazon shows $2,374 outliers — while used eBay floor sits ~$770. Plan to source used or wait for RDNA4 inventory.
  • Budget 5–10 hours for first-time ROCm setup. Official AMD packages are the reliable path; distro repos lag by 6–12 months.
  • Ollama on AMD is still patchy. Use vLLM or llama.cpp with HIP for the reliable runner story.
  • MoE on ROCm is patchy: llama.cpp #19880 is closed, but #20024 + #20545 remain open on Qwen 3.5 35B-A3B with ROCm 7.2. The Vulkan backend side-steps these and outperforms ROCm on the 7900 XTX for MoE anyway.

Local vs cloud at this tier

● LOCAL WINS

24 GB for under $1,000, fully offline, no per-token cost. Hardware longevity on AMD is better than NVIDIA historically — less planned obsolescence.

● CLOUD WINS

No driver maintenance, no kernel compatibility issues, immediate model access the day they're released. For anyone whose time is billable, cloud looks compelling compared to the AMD software story.

At ~$750 used with regular usage, break-even vs ChatGPT Plus is ~30–36 months. The real question is whether your hourly rate makes ROCm debugging a net positive.

Next step

Load this setup into the planner