Run a real LLM right here, in your browser.

WebGPU · MLC WebLLM · your visitor GPU does the work, nothing leaves your machine.

Local AI usually means installing Ollama or LM Studio. This page skips the install — it downloads the model weights the first time, caches them in your browser’s IndexedDB, and runs inference on your GPU via WebGPU. Useful as a “what does this feel like” demo before committing to a real runner, and a reality-check on what a 3–9B model actually delivers.

Try in your browser

Run a model locally, right here.

WebGPU · nothing leaves your machine · cached after first load

Pick a size — smaller is faster, larger is smarter.

First load downloads the model. After that it runs instantly — cached in your browser.

Want 14B, 27B, or bigger? This demo caps at what a browser tab can serve. For larger models, use the install walkthrough — two Ollama commands, no download limit, runs on the same GPU.

Honest framing

This is a toy, not the recommended path. WebLLM caps out at 9B (Gemma 2) for prebuilt models. 14B+ needs self-compiled MLC weights — not worth it. For real work, use Ollama or LM Studio. Our hardware picks + model pages cover both.

Browsers without WebGPU (older Safari, any pre-Chrome-113): the embed falls back to a polite message with Ollama / LM Studio pointers.

Next step

Back to the planner→