NVIDIA RTX 5080

Same 16 GB ceiling as the 5060 Ti, twice the bandwidth.

Blackwell architecture + GDDR7 at 960 GB/s buys you ~30–40% more tok/s than the 5060 Ti 16 GB, but the VRAM ceiling is identical. If your work lives in the 8B–14B dense band, this is the honest Blackwell pick; if you need 30B-A3B MoE with headroom, you need more memory.

The decision in five lines

The call: Buy — Same 16 GB ceiling as the 5060 Ti, twice the bandwidth.
Best for: Blackwell mid-high
Runs well: Qwen3-14B · Qwen 3.5 9B · Qwen 3.5 9B + RAG
Watch out: Blackwell driver + llama.cpp maturity has improved through 2026 (5–8% gains in late-April builds, ~48–54 tok/s on Qwen 3 27B Q4 vs 44.9 tok/s on 8B at launch) but still trails RTX 3090 on a per-tok/s basis. Check recent llama.cpp build notes before buying for pure inference.
Evidence: Estimated · last verified July 2026

16: GB GDDR7
960: GB/S BANDWIDTH
360: W TDP
~$1,250: STREET FLOOR (JULY)

What fits at this tier

Fits 8B and 14B dense at Q4 cleanly with room for 16–32K context. 30B-A3B MoE Q4 (~17 GB) doesn't fit; Q3 is the workable path at this tier. Measured TG: late-April 2026 driver builds put 5080 at ~48–54 tok/s on Qwen 3 27B Q4 in LM Studio benches — up from 44.9 tok/s on 8B at launch. Driver maturity has improved 5–8% since February but still trails RTX 3090 (92.5 tok/s on 8B Q4).

CODING

Qwen3-14B Sticky 14B workhorse; 128K context; Apache 2.0; broad runner support.

CHAT / GENERAL

Qwen 3.5 9B 262K context with native multimodal; strong on GPQA, IFEval, LiveCodeBench at the 9B size.

DOCS & RETRIEVAL

Qwen 3.5 9B + RAG Chunk aggressively, retrieve well; 262K native context handles big retrieval windows comfortably.

IMAGE

FLUX.2 klein 4B (Apache 2.0) BFL's first fully Apache-2.0 model; 4B distilled for fast inference on mid-tier GPUs; commercial OK.

AGENTS

Qwen 3.5 9B Strong tool-use performance for 9B; supports thinking mode and 201-language coverage.

VOICE

Chatterbox Multilingual (Resemble AI) MIT; 23 languages; voice cloning + emotion dial; pip 0.1.7 (March 2026) shows active development.

The call

Buy it if you want Blackwell-generation driver support and the extra bandwidth headroom over the 5060 Ti, and you'll stick to 14B-class models. Also buy if you want the newest GPU in the lineup for game + AI duty.
Skip it if you want 24 GB — at $1,250–$1,400 you're within ~$300 of a used RTX 3090 24 GB (~$1,050) and used eBay 5080s sit ~$1,100–$1,150 anyway. Either the 3090's extra VRAM or the used-5080 discount gives you more per dollar than a new card here.

Watchouts

Blackwell driver + llama.cpp maturity has improved through 2026 (5–8% gains in late-April builds, ~48–54 tok/s on Qwen 3 27B Q4 vs 44.9 tok/s on 8B at launch) but still trails RTX 3090 on a per-tok/s basis. Check recent llama.cpp build notes before buying for pure inference.
The $999 MSRP is gone from retail as of July 2026 — the cheapest new card tracks ~$1,250 (Amazon Zotac Solid Core, flat since mid-June) with Newegg at ~$1,379, and Tom's Hardware calls 5080 supply "poor at any reasonable price." Used eBay runs ~$1,100–$1,150. Top-tier OC variants still reach $1,799; none of the premium buys you more AI throughput.
PCIe 5.0 x16 is the speed, but most boards will slot it at x8 if you also install a capture card or second GPU — fine for inference, not great for training.
12VHPWR connector: follow the re-seat guidance. Native dual-cable 12V-2x6 PSU is the safe path.

Local vs cloud at this tier

● LOCAL WINS

Modern Blackwell featureset (FP8/FP4 where llama.cpp supports it), fast 8B/14B dense, cleanest new-GPU warranty path in the lineup.

● CLOUD WINS

Cloud wins hard at this tier for MoE 30B-A3B — the 16 GB ceiling locks you out. Also wins on first-day model access and anything frontier.

Coherent only if you specifically want a new Blackwell card for mixed gaming + local AI use, and you'll live within 14B dense. For pure inference under $1,400, a used RTX 3090 or new RTX 4090 is the better memory-per-dollar answer.

Next step

Load this setup into the planner→