HARDWARE · BUDGET ENTRY · 12 GB
NVIDIA RTX 3060 12 GB
The honest floor for NVIDIA CUDA local AI.
A 5-year-old card that still runs Llama 3.1 8B Q4 at 52 tok/s. Amazon street is ~$354, eBay used floors around $230, and NVIDIA is restarting 8 nm production with Samsung in June 2026 — supply should ease through summer. 12 GB is the minimum VRAM that matters for 8B-class models with any context, and CUDA just works everywhere.
The decision in five lines
- The call
- Consider — The honest floor for NVIDIA CUDA local AI.
- Best for
- Budget entry
- Runs well
- Qwen 3.5 4B · Qwen 3.5 4B + tight RAG · SANA-0.6B (non-commercial)
- Watch out
- NVIDIA is restarting RTX 3060 12 GB production in June 2026 (Samsung 8 nm wafers; AIB mass production July). Street prices may soften through Q3 — if you don't need the card today, $280 new feels like the floor, not the ceiling.
- Evidence
- Estimated
- 12
- GB GDDR6
- 360
- GB/S BANDWIDTH
- 170
- W TDP
- ~$354
- NEW (AMAZON)
What fits at this tier
Fits 8B dense at Q4 with room for 8–16K context (LocalScore 446, TG 52 tok/s). 14B Q4 technically fits (~8 GB weights) but leaves only ~3 GB for KV cache — short-context work only. 4B-class models fly at 100+ tok/s. No 30B-A3B or 27B dense at this tier.
The call
Buy it if you want a working CUDA inference node under $400 and your use case is 8B chat, document Q&A, or learning. Drops into any PC built in the last decade.
Skip the 8 GB variant entirely — it's a different card masquerading under the same name. Also skip if you'll regret not being able to run MoE 30B-A3B — the $550 RTX 5060 Ti 16 GB is where that unlock starts.
Watchouts
- NVIDIA is restarting RTX 3060 12 GB production in June 2026 (Samsung 8 nm wafers; AIB mass production July). Street prices may soften through Q3 — if you don't need the card today, $280 new feels like the floor, not the ceiling.
- 8 GB variant of the "RTX 3060" exists and is a trap for local AI. Always confirm the 12 GB SKU before buying.
- 360 GB/s is the bandwidth ceiling — don't expect 3060 to scale past 14B Q4 meaningfully even when weights technically fit.
- Used cards from mining rigs often have cosmetic wear and fan wear. Prefer 1-owner gaming pulls; test the card's memory under load on arrival (GPU-Z + a 30-minute stability test).
Local vs cloud at this tier
● LOCAL WINS
Entry-level privacy + local dev loops at 8B scale. Genuinely useful for document summarization, embeddings, local RAG, and chat at the 8B Q4 quality level.
● CLOUD WINS
Cloud wins on quality for anything above 8B. At this tier you're running 2024-era open models; Claude Pro at $20/mo delivers frontier quality at 50+ tok/s with zero hardware risk.
Genuinely worth it as a learning node or a privacy-first 8B daily driver. Break-even vs Claude Pro is ~12 months on cost alone — but the value at this tier is local-itself, not raw quality.
Next step
Load this setup into the planner→