HARDWARE · PROSUMER SINGLE-CARD · 48 GB ECC
NVIDIA RTX A6000 (48 GB, used)
Single-slot 70B Q4 at 14.5 tok/s without the dual-GPU space heater.
The only consumer-reachable single-card path to 48 GB VRAM under $5,000. Ampere-generation workstation silicon with ECC memory, 768 GB/s bandwidth, and a dual-slot blower that tolerates sustained load. Used-market prices make it an Ada-tax-avoidance play.
The decision in five lines
- The call
- Buy — Single-slot 70B Q4 at 14.5 tok/s without the dual-GPU space heater.
- Best for
- Prosumer single-card
- Runs well
- Qwen3-Coder-30B-A3B (MoE, fits 24GB) · Qwen 3.6-27B · Z-Image-Turbo (Apache 2.0)
- Watch out
- Blower fan ramps loud under sustained inference — qualitatively "quieter than 3090" but still noticeable. Server-aesthetic, not bedroom-aesthetic.
- Evidence
- Estimated
- 48
- GB GDDR6 ECC
- 768
- GB/S BANDWIDTH
- 300
- W TDP
- ~$3,500
- USED (EBAY)
What fits at this tier
48 GB fits Llama 3.3 70B Q4 (~40 GB) cleanly with room for context: ~14.5 tok/s TG via Ollama (Databasemart). Qwen 3.5 27B dense Q4 at ~50–65 tok/s, 35B-A3B MoE at ~75–95 tok/s estimated. 122B-A10B at 4-bit fits with room (~70 GB total for weights + KV — tight). ECC memory is a real win for ML science workloads.
The call
Buy it if you want 70B Q4 in a single-slot workstation form factor — blower fan, PCIe 4.0 x16, server-ready — and you can't stomach dual-3090 thermal + PCIe split. Also the default pick when ECC matters (some scientific workflows).
Skip it if you don't need 48 GB specifically — a single RTX 5090 32 GB is newer, faster for everything that fits in 32 GB, and has FP8/FP4 support Ampere lacks. Also skip if you want warranty — used A6000s are as-is.
Watchouts
- Blower fan ramps loud under sustained inference — qualitatively "quieter than 3090" but still noticeable. Server-aesthetic, not bedroom-aesthetic.
- Ampere architecture is nearly 6 years old in June 2026. No native FP8/FP4 (Blackwell has this), no flash-attention-3 speedups. 2× the raw VRAM of a 5090 at ~60% the bandwidth.
- Used market is opaque. PCSP + ITCreations offer 90-day warranties; eBay is as-is. Budget for a memory stress test on arrival; cards sold by ex-mining operations need thorough vetting.
- CUDA 13+ features bias toward Ada/Blackwell. Some llama.cpp performance optimizations (FA-3) won't light up on Ampere. RTX 6000 Ada ($7,000+ new) is the forward-looking successor if budget allows.
Local vs cloud at this tier
● LOCAL WINS
Single-slot 70B Q4 at 14.5 tok/s — the quietest single-card 70B path in the lineup. ECC memory for scientific ML. Workstation-grade drivers for CUDA / ML pipelines.
● CLOUD WINS
Cloud wins on frontier reasoning and zero-hardware-risk. An A6000 is a multi-year bet on Ampere remaining driver-supported, which NVIDIA's Production Branch 570.xx still is but won't be forever.
The right pick for a quiet, single-GPU 70B workstation in June 2026 — if ECC matters, and if the used-market price holds at $3,500–$4,500. Above $5,000 the RTX 6000 Ada new-card path starts to make sense.
Next step
Load this setup into the planner→