AMD works for local AI — if you already have the card.

The hardware is fine. The RX 7900 XTX has 24 GB of GDDR6 and 960 GB/s of bandwidth; on paper it competes with a 4090. The problem is the software stack, and in 2026 it’s still the reason most people don’t buy AMD for AI.

This piece is the honest friction write-up. If you already own a 7900 XTX, this is how to make it useful. If you’re choosing between AMD and NVIDIA for a new build specifically for AI, buy NVIDIA.

The setup cost nobody mentions

Community reports converge on 5–10 hours of driver and environment work before the first cleanollama run or python inference.py. Not because ROCm is broken — because it’s version- matrix-sensitive. Ubuntu kernel + ROCm release + PyTorch build + llama.cpp commit all have to line up. Get any one wrong and you spend a Saturday on GitHub issues.

On NVIDIA or a Mac, the same install is 15 minutes. This is the real cost, and it’s not going away — AMD’s software team is small relative to the scope of the problem, and the community does more of the heavy lifting than vendors want to admit.

The surprise: Vulkan often wins

On RDNA3 cards, llama.cpp’s Vulkan backend frequently outperforms ROCm for single-GPU inference — and it installs in minutes, not hours. Our calibration table shows a single RDNA3 card hitting 3,033 PP / 183 TG tok/s on Qwen3 30B-A3B Q4 via Vulkan, within ~10% of equivalent NVIDIA cards.

If you’re on AMD and haven’t tried Vulkan: start there. ROCm is only strictly necessary for multi-GPU tensor-parallel (e.g., dual 7900 XTX running Llama 3.1 70B Q4) or for vLLM which only supports ROCm on AMD.

MoE on ROCm — the crash era is over

Updated July 2026 — this section used to say the opposite. MoE picks (Qwen3moe 30B-A3B, Qwen 3.5 35B-A3B, gpt-oss-20b) genuinely did have HIP kernel problems through 2025 and early 2026, and we told you to avoid ROCm because of them. That is no longer true. All three tracked bugs are closed: llama.cpp #19880 (closed Feb 28 2026), #20024 — the Qwen 3.5 35B-A3B memory-access fault — fixed on both ROCm and Vulkan by a March 12 commit and confirmed by the reporter (closed Mar 13 2026), and #20545, the WSL2 hang, closed not-planned on Jun 2 2026 after being pinned to the ROCm stack rather than llama.cpp. ROCm itself is now at 7.14 (July 16 2026), several releases past the 7.2 those reports were filed against.

Vulkan is still usually the better backend for MoE token generation on RDNA3 — community benchmarks on the 7900 XTX put Vulkan well ahead of ROCm on decode — but the tradeoff is real rather than one-sided: the same runs show ROCm ahead on prompt processing. So pick by workload, not by fear. Long prompts, short answers: ROCm. Short prompts, long answers: Vulkan. Either way you are no longer choosing a backend to dodge a crash.

Runner recommendations for AMD

llama.cpp + Vulkan — the fastest path to “it works.” Install llama.cpp, compile with -DGGML_VULKAN=ON, done.
llama.cpp + ROCm — only if Vulkan performance falls short on your specific card + model combination. Expect the driver-setup tax.
vLLM with ROCm — if you’re doing multi-GPU tensor-parallel on 2× 7900 XTX or want production-grade concurrent serving. This is where ROCm is genuinely useful.
Ollama on AMD — possible, but the ROCm backend has been patchy. If Ollama is a hard requirement, stay on NVIDIA.

Should you buy AMD for AI?

New build, AI is the primary use: no. A used RTX 4090 at $1,600–$2,400 runs everything the 7900 XTX runs, in 15 minutes of setup, with fewer gotchas. The AMD price advantage doesn’t compensate for the time tax.

New build, gaming is the primary use + AI is a bonus: sure. A 7900 XTX is a legitimate gaming card; running a 14B model on the same hardware is a reasonable bonus. Use Vulkan.

You already own the card: absolutely. 24 GB of VRAM is 24 GB of VRAM. The guidance above gets you from zero to a working local-AI stack in an afternoon instead of a weekend.

Next step

Read the AMD RX 7900 XTX hardware card→