MODEL · DEEPSEEK · 284B TOTAL / 13B ACTIVE (MOE)
DeepSeek V4-Flash
The smaller half of the V4 family — 284B MoE with 13B active per token. Same 1M context, same MIT license, same architectural KV-cache improvements as V4-Pro. The honest local pick of the V4 line: still frontier-class on most benchmarks, but realistically deployable only on M3 Ultra 192GB unified or dual 80GB server cards.
License: MIT · Context: 1M tokens (384K max output) · Released: April 24, 2026 (preview)
The decision in five lines
- The call
- Hosted only
- Best for
- coding
- Runs on
- Hosted or workstation-class only · ~158 GB (Unsloth Q4_K_M; needs M3 Ultra 192GB+ unified or dual 80GB server cards; not single-card consumer)
- Watch out
- At ~158 GB Q4 this exceeds every pick in the planner's 22-card library — workstation tier, not consumer.
- Evidence
- Estimated
- 284B total
- PARAMETERS
- MOE
- TYPE
- 1M
- CONTEXT
- ~158 GB (Unsloth Q4_K_M; needs M3 Ultra 192GB+ unified or dual 80GB server cards; not single-card consumer)
- VRAM AT Q4
Where we recommend this
Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.
The call
The smaller half of the V4 family — 284B MoE with 13B active per token. Same 1M context, same MIT license, same architectural KV-cache improvements as V4-Pro. The honest local pick of the V4 line: still frontier-class on most benchmarks, but realistically deployable only on M3 Ultra 192GB unified or dual 80GB server cards.
When not to use: Single-card consumer hardware. At ~158 GB Q4 this exceeds every pick in the planner's 22-card library — workstation tier, not consumer. Use V4-Pro via API for outright frontier; use Qwen3-Coder-30B-A3B locally if you need single-card.
Runner notes
Unsloth dynamic GGUFs at `unsloth/DeepSeek-V4-Flash` (early community quants mishandled the MoE router — use Unsloth's). Ollama tag `deepseek-v4-flash` exists but practical only on workstation rigs. vLLM + multi-GPU is the cleanest production path. antirez maintains an experimental llama.cpp fork. DeepSeek API also exposes V4-Flash separately, often at half V4-Pro's rate.
Next step
Find-by-model — see what hardware runs this→