DeepSeek V4-Flash

The smaller half of the V4 family — 284B MoE with 13B active per token. Same 1M context, same MIT license, same architectural KV-cache improvements as V4-Pro. The honest local pick of the V4 line: still frontier-class on most benchmarks, but realistically deployable only on M3 Ultra 192GB unified or dual 80GB server cards.

License: MIT · Context: 1M tokens (384K max output) · Released: April 24, 2026 (preview)

The decision in five lines

The call: Hosted only
Best for: coding
Runs on: Hosted or workstation-class only · ~158 GB (Unsloth Q4_K_M; needs M3 Ultra 192GB+ unified or dual 80GB server cards; not single-card consumer)
Watch out: At ~158 GB Q4 this exceeds every pick in the planner's 22-card library — workstation tier, not consumer.
Evidence: Estimated · last verified July 2026

284B total: PARAMETERS
MOE: TYPE
1M: CONTEXT
~158 GB (Unsloth Q4_K_M; needs M3 Ultra 192GB+ unified or dual 80GB server cards; not single-card consumer): VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CODING · TOP

DeepSeek V4-Flash (284B, multi-GPU)April 24 2026; 13B active, MIT, 1M context. Frontier-class at multi-GPU local (~158 GB Q4) or DeepSeek API at meaningfully lower price than GPT-5.5.

The call

The smaller half of the V4 family — 284B MoE with 13B active per token. Same 1M context, same MIT license, same architectural KV-cache improvements as V4-Pro. The honest local pick of the V4 line: still frontier-class on most benchmarks, but realistically deployable only on M3 Ultra 192GB unified or dual 80GB server cards.
When not to use: Single-card consumer hardware. At ~158 GB Q4 this exceeds every pick in the planner's 22-card library — workstation tier, not consumer. Use V4-Pro via API for outright frontier; use Qwen3-Coder-30B-A3B locally if you need single-card.

Runner notes

Unsloth dynamic GGUFs at `unsloth/DeepSeek-V4-Flash` (early community quants mishandled the MoE router — use Unsloth's). Ollama tag `deepseek-v4-flash` exists but practical only on workstation rigs. vLLM + multi-GPU is the cleanest production path. antirez maintains an experimental llama.cpp fork. DeepSeek API also exposes V4-Flash separately, often at half V4-Pro's rate.

License: MIT
Released: April 24, 2026 (preview)
Maker: DeepSeek
Model card: huggingface.co/deepseek-ai/DeepSeek-V4-Flash →

Next step

Find-by-model — see what hardware runs this→