What can local AI actually do, in July 2026.

One landing page per use case. Each one answers the same three questions — what's the honest tier-by-tier answer, what's the right hardware to run it well, and when does cloud still beat local. No generic "best LLM" lists; just the editorial call for the specific thing you want to do.

Pick a use case

CODING

Local AI that codes with you.

At 24 GB VRAM and above, local coding models are genuinely competitive with cloud for most real work. Below that, they handle autocomplete and simple refactors but stop being a Claude-replacement. Here's the honest tier-by-tier read.

Comfortable on 24GB+, workable belowRead →

CHAT

Local AI for everyday conversation.

The 16GB sweet spot is now genuinely good. Gemma 4 26B A4B at 3.8B active runs sub-second-to-first-token on consumer GPUs at quality that beats GPT-3.5 of two years ago. The honest case for local chat is privacy, offline reliability, and zero per-token cost — not raw quality vs Opus.

Genuinely good at 16GB+, frontier-class at 64GB+Read →

DOCS · LONG-CONTEXT · RAG

Local AI for documents, retrieval, and long-context work.

The honest reliable context window in July 2026 is still 32–64K tokens for most local rigs. Advertised numbers — 200K, 262K, 1M — degrade past their training cap. Plan for RAG below 96GB hardware. Above it, DeepSeek V4-Flash (April 2026) is the first model that makes 1M genuinely useful.

Workable with RAG; 1M-context era is frontier-onlyRead →

IMAGE

Local image generation.

HiDream-O1-Image (May 8 2026, 8B MIT, pixel-space architecture) is the new #1 open-weight model on Artificial Analysis T2I Arena (Elo 1184) — beats FLUX.2 dev (32B) and Qwen-Image-2512 (20B) at 3–7× fewer params. Choose by license (FLUX.2 dev is non-commercial) and by what you generate (text-rendering, photorealism, speed).

Strong open-weight options at every tier above 8GBRead →

AGENTS · TOOL USE

Local AI for agentic loops and tool use.

Agents are where local AI struggles most against cloud. Frontier reasoning + reliable tool-use + long-horizon coherence is genuinely hard at open-weight scale below 96GB. Above that, you can run real agent loops locally — but the comparison is honest: cloud frontier models still lead.

Workable at 24GB+; competitive with cloud only at frontierRead →

VOICE · TTS · STT

Local voice — narration, cloning, transcription.

TTS Arena went multi-polar in March 2026 — Fish Audio S2 Pro now leads at Elo 1128 but is non-commercial; Kokoro-82M (Apache 2.0) dropped from #1 to mid-pack on quality but remains the practical choice for English narration on CPU. Voice runs differently from text — Ollama has no native TTS or STT, so you'll route through Open-WebUI plus dedicated servers.

Strong at every tier — choose by license + use case, not just qualityRead →

How these pages work

Every use-case page pulls its picks live from the planner — same data the recommendation engine uses. The editorial framing on top answers the "should I bother running this locally" question that the planner alone doesn't surface. Verified against live community signal (HuggingFace trending, Artificial Analysis arenas, MTEB, Aider leaderboard) at refresh time, not just training data.

When a page recommends a model, click through to /models/ for the detail page (license, runner notes, hardware that fits). When it recommends hardware, click through to /hardware/ for the full editorial verdict. Cross-linked so the picks stay honest across surfaces.