Intel Arc B580 12 GB

The honest 12 GB-under-$300 option, with software-stack asterisks.

12 GB GDDR6 at $249 MSRP is the cheapest new discrete GPU with enough VRAM for 8B-class local AI. The catch is the software: Intel's IPEX-LLM — the main path for Ollama on Arc — was archived on January 28, 2026. Still works, still runs 8B models at 28–62 tok/s, but you're betting on a project Intel is no longer actively maintaining. Worse: Intel canceled the Arc B770 mid-2026 and re-routed the BMG-G31 die to a Pro workstation card, so the B580 is the terminal Battlemage consumer SKU.

The decision in five lines

The call: Consider — The honest 12 GB-under-$300 option, with software-stack asterisks.
Best for: Team blue · cautionary
Runs well: Qwen 3.5 4B · Qwen 3.5 4B + tight RAG · SANA-0.6B (non-commercial)
Watch out: intel/ipex-llm GitHub repo archived January 28, 2026 — read-only since. Existing builds still work but Intel's future LLM tooling strategy is unclear.
Evidence: Estimated · last verified July 2026

12: GB GDDR6
456: GB/S BANDWIDTH
190: W TDP
$249: MSRP (LAUNCH)

What fits at this tier

Runs 8B-class models (Llama 3.1 8B, Qwen 3.5 4B, Phi-4 Mini) cleanly at Q4 via IPEX-LLM + llama.cpp. 13B dense at Q4 spills to system RAM and tanks throughput. 28–62 tok/s on 8B Q4 depending on runner path.

CODING

Qwen 3.5 4B 4B dense with 262K context; surprisingly coherent for its size.

CHAT / GENERAL

Qwen 3.5 4B 4B dense with 262K context and native multimodal.

DOCS & RETRIEVAL

Qwen 3.5 4B + tight RAG 4B plus tight chunking; keep context windows small.

IMAGE

SANA-0.6B (non-commercial) 0.6B params; <1s per 1024² on a 16GB laptop GPU; weights are NVIDIA NSCL v2 (non-commercial).

AGENTS

Ministral 3 3B Smallest Ministral with reasoning + tool use.

VOICE

Kokoro-82M (Apache 2.0) Community daily driver for English TTS; CPU-real-time at 82M params; v1.0 with 8 languages and 54 voices. No voice cloning.

The call

Buy it if you already own an Intel CPU, enjoy tinkering with IPEX-LLM / oneAPI, and want the cheapest 12 GB path into local AI. Gaming-first + LLM-second is a reasonable frame.
Skip it if LLM inference is your primary use — spend another $250 on an RTX 5060 Ti 16 GB and avoid the software-stack fragility entirely. The 4 GB VRAM gap plus the IPEX-LLM archive signal makes this a weak default pick.

Watchouts

intel/ipex-llm GitHub repo archived January 28, 2026 — read-only since. Existing builds still work but Intel's future LLM tooling strategy is unclear.
Ollama on Arc requires IPEX-LLM wrapper or the Portable Zip build. Mainline Ollama does not natively detect Arc GPUs. Expect 2–4 hours of setup vs ~15 min on NVIDIA.
12 GB caps you at 8B-class models with reasonable context. 13B dense at Q4 will not fit with headroom.
Known issues history: SYCL errors in Docker/Podman, "cannot find preferred GPU platform" errors, models loading into RAM rather than VRAM. Most resolved by early 2026 but setup friction remains material.

Local vs cloud at this tier

● LOCAL WINS

12 GB under $300 new with full CUDA-equivalent compute for 8B models. Privacy, unlimited chat/coding at 8B tier.

● CLOUD WINS

GPT-5 API at $0.625/$5 per 1M tokens gives you frontier quality with zero setup. At this tier, a heavy user hits break-even vs a $270 B580 in 18–24 months — cloud wins outright on quality-per-dollar for anyone who values their time.

The honest editorial framing: if you already have the Intel hardware and the tinkering appetite, it's the cheapest 12 GB entry. For everyone else, either the RTX 5060 Ti 16 GB at $550 or a used RTX 3060 12 GB at $200–$260 is the saner pick.

Next step

Load this setup into the planner→