MODEL · DEEPSEEK · 1.6T TOTAL / 49B ACTIVE (MOE)
DeepSeek V4-Pro
DeepSeek's frontier-class V4 flagship — 1.6T MoE that matches GPT-5.4 and Sonnet 4.6 on most benchmarks at meaningfully lower hosted price. The 1M-context default uses ~27% of V3.2's single-token FLOPs and ~10% of its KV cache thanks to architecture changes. MIT-licensed, but not realistically a local pick at this size.
License: MIT · Context: 1M tokens (384K max output) · Released: April 24, 2026 (preview)
The decision in five lines
- The call
- Hosted only
- Best for
- Hosted reference and benchmarks
- Runs on
- Hosted or workstation-class only · ~800 GB+ (not consumer-local; hosted only realistic)
- Watch out
- Use the DeepSeek API or a hosted route — Together, OpenRouter, or DeepSeek's own endpoints — and let the smaller V4-Flash sibling fill multi-GPU local roles.
- Evidence
- Estimated
- 1.6T total
- PARAMETERS
- MOE
- TYPE
- 1M
- CONTEXT
- ~800 GB+ (not consumer-local; hosted only realistic)
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
DeepSeek's frontier-class V4 flagship — 1.6T MoE that matches GPT-5.4 and Sonnet 4.6 on most benchmarks at meaningfully lower hosted price. The 1M-context default uses ~27% of V3.2's single-token FLOPs and ~10% of its KV cache thanks to architecture changes. MIT-licensed, but not realistically a local pick at this size.
When not to use: Local hardware budgets under 8× H100 (~$200K). At Q4 the weights alone exceed 800 GB. Use the DeepSeek API or a hosted route — Together, OpenRouter, or DeepSeek's own endpoints — and let the smaller V4-Flash sibling fill multi-GPU local roles.
Runner notes
Hosted via DeepSeek API at materially lower prices than GPT-5.4 / Sonnet 4.6. Unsloth dynamic GGUFs exist (`unsloth/DeepSeek-V4-Pro`) but the practical home is API. 1M context is the default across all official services.
Next step
Find-by-model — see what hardware runs this→