the AI bench
VERIFIED JUNE 2026
All guides

GUIDE · AMD ROCM · JUNE 2026

AMD works for local AI — if you already have the card.

The hardware is fine. The RX 7900 XTX has 24 GB of GDDR6 and 960 GB/s of bandwidth; on paper it competes with a 4090. The problem is the software stack, and in 2026 it’s still the reason most people don’t buy AMD for AI.

This piece is the honest friction write-up. If you already own a 7900 XTX, this is how to make it useful. If you’re choosing between AMD and NVIDIA for a new build specifically for AI, buy NVIDIA.


The setup cost nobody mentions

Community reports converge on 5–10 hours of driver and environment work before the first cleanollama run or python inference.py. Not because ROCm is broken — because it’s version- matrix-sensitive. Ubuntu kernel + ROCm release + PyTorch build + llama.cpp commit all have to line up. Get any one wrong and you spend a Saturday on GitHub issues.

On NVIDIA or a Mac, the same install is 15 minutes. This is the real cost, and it’s not going away — AMD’s software team is small relative to the scope of the problem, and the community does more of the heavy lifting than vendors want to admit.

The surprise: Vulkan often wins

On RDNA3 cards, llama.cpp’s Vulkan backend frequently outperforms ROCm for single-GPU inference — and it installs in minutes, not hours. Our calibration table shows a single RDNA3 card hitting 3,033 PP / 183 TG tok/s on Qwen3 30B-A3B Q4 via Vulkan, within ~10% of equivalent NVIDIA cards.

If you’re on AMD and haven’t tried Vulkan: start there. ROCm is only strictly necessary for multi-GPU tensor-parallel (e.g., dual 7900 XTX running Llama 3.1 70B Q4) or for vLLM which only supports ROCm on AMD.

MoE on ROCm — patchy through June 2026

MoE picks (Qwen3moe 30B-A3B, Qwen 3.5 35B-A3B, gpt-oss-20b) had HIP kernel issues through most of 2025. The original blocker (llama.cpp #19880) is closed, but #20024 and #20545 are still open as of late June 2026 — Qwen 3.5 35B-A3B on ROCm 7.2 hangs in WSL2 and crashes intermittently on bare metal. The honest play on AMD is the Vulkan backend, which side-steps these entirely and outperforms ROCm on the 7900 XTX for MoE anyway.

Runner recommendations for AMD

  • llama.cpp + Vulkan — the fastest path to “it works.” Install llama.cpp, compile with -DGGML_VULKAN=ON, done.
  • llama.cpp + ROCm — only if Vulkan performance falls short on your specific card + model combination. Expect the driver-setup tax.
  • vLLM with ROCm — if you’re doing multi-GPU tensor-parallel on 2× 7900 XTX or want production-grade concurrent serving. This is where ROCm is genuinely useful.
  • Ollama on AMD — possible, but the ROCm backend has been patchy. If Ollama is a hard requirement, stay on NVIDIA.

Should you buy AMD for AI?

New build, AI is the primary use: no. A used RTX 4090 at $1,600–$2,400 runs everything the 7900 XTX runs, in 15 minutes of setup, with fewer gotchas. The AMD price advantage doesn’t compensate for the time tax.

New build, gaming is the primary use + AI is a bonus: sure. A 7900 XTX is a legitimate gaming card; running a 14B model on the same hardware is a reasonable bonus. Use Vulkan.

You already own the card: absolutely. 24 GB of VRAM is 24 GB of VRAM. The guidance above gets you from zero to a working local-AI stack in an afternoon instead of a weekend.

Next step

Read the AMD RX 7900 XTX hardware card