NVIDIA RTX 3060 12 GB

The honest floor for NVIDIA CUDA local AI.

A 5-year-old card that still runs Llama 3.1 8B Q4 at 52 tok/s. Amazon street is ~$354, eBay used floors around $230, and NVIDIA is restarting 8 nm production with Samsung in June 2026 — supply should ease through summer. 12 GB is the minimum VRAM that matters for 8B-class models with any context, and CUDA just works everywhere.

The decision in five lines

The call: Consider — The honest floor for NVIDIA CUDA local AI.
Best for: Budget entry
Runs well: Qwen 3.5 4B · Qwen 3.5 4B + tight RAG · SANA-0.6B (non-commercial)
Watch out: NVIDIA is restarting RTX 3060 12 GB production in June 2026 (Samsung 8 nm wafers; AIB mass production July). Street prices may soften through Q3 — if you don't need the card today, $280 new feels like the floor, not the ceiling.
Evidence: Estimated · last verified July 2026

12: GB GDDR6
360: GB/S BANDWIDTH
170: W TDP
~$354: NEW (AMAZON)

What fits at this tier

Fits 8B dense at Q4 with room for 8–16K context (LocalScore 446, TG 52 tok/s). 14B Q4 technically fits (~8 GB weights) but leaves only ~3 GB for KV cache — short-context work only. 4B-class models fly at 100+ tok/s. No 30B-A3B or 27B dense at this tier.

CODING

Qwen 3.5 4B 4B dense with 262K context; surprisingly coherent for its size.

CHAT / GENERAL

Qwen 3.5 4B 4B dense with 262K context and native multimodal.

DOCS & RETRIEVAL

Qwen 3.5 4B + tight RAG 4B plus tight chunking; keep context windows small.

IMAGE

SANA-0.6B (non-commercial) 0.6B params; <1s per 1024² on a 16GB laptop GPU; weights are NVIDIA NSCL v2 (non-commercial).

AGENTS

Ministral 3 3B Smallest Ministral with reasoning + tool use.

VOICE

Kokoro-82M (Apache 2.0) Community daily driver for English TTS; CPU-real-time at 82M params; v1.0 with 8 languages and 54 voices. No voice cloning.

The call

Buy it if you want a working CUDA inference node under $400 and your use case is 8B chat, document Q&A, or learning. Drops into any PC built in the last decade.
Skip the 8 GB variant entirely — it's a different card masquerading under the same name. Also skip if you'll regret not being able to run MoE 30B-A3B — the $550 RTX 5060 Ti 16 GB is where that unlock starts.

Watchouts

NVIDIA is restarting RTX 3060 12 GB production in June 2026 (Samsung 8 nm wafers; AIB mass production July). Street prices may soften through Q3 — if you don't need the card today, $280 new feels like the floor, not the ceiling.
8 GB variant of the "RTX 3060" exists and is a trap for local AI. Always confirm the 12 GB SKU before buying.
360 GB/s is the bandwidth ceiling — don't expect 3060 to scale past 14B Q4 meaningfully even when weights technically fit.
Used cards from mining rigs often have cosmetic wear and fan wear. Prefer 1-owner gaming pulls; test the card's memory under load on arrival (GPU-Z + a 30-minute stability test).

Local vs cloud at this tier

● LOCAL WINS

Entry-level privacy + local dev loops at 8B scale. Genuinely useful for document summarization, embeddings, local RAG, and chat at the 8B Q4 quality level.

● CLOUD WINS

Cloud wins on quality for anything above 8B. At this tier you're running 2024-era open models; Claude Pro at $20/mo delivers frontier quality at 50+ tok/s with zero hardware risk.

Genuinely worth it as a learning node or a privacy-first 8B daily driver. Break-even vs Claude Pro is ~12 months on cost alone — but the value at this tier is local-itself, not raw quality.

Next step

Load this setup into the planner→