DeepSeek V4-Pro

DeepSeek's frontier-class V4 flagship — 1.6T MoE that matches GPT-5.4 and Sonnet 4.6 on most benchmarks at meaningfully lower hosted price. The 1M-context default uses ~27% of V3.2's single-token FLOPs and ~10% of its KV cache thanks to architecture changes. MIT-licensed, but not realistically a local pick at this size.

License: MIT · Context: 1M tokens (384K max output) · Released: April 24, 2026 (preview)

The decision in five lines

The call: Hosted only
Best for: Hosted reference and benchmarks
Runs on: Hosted or workstation-class only · ~800 GB+ (not consumer-local; hosted only realistic)
Watch out: Use the DeepSeek API or a hosted route — Together, OpenRouter, or DeepSeek's own endpoints — and let the smaller V4-Flash sibling fill multi-GPU local roles.
Evidence: Estimated · last verified July 2026

1.6T total: PARAMETERS
MOE: TYPE
1M: CONTEXT
~800 GB+ (not consumer-local; hosted only realistic): VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

DeepSeek's frontier-class V4 flagship — 1.6T MoE that matches GPT-5.4 and Sonnet 4.6 on most benchmarks at meaningfully lower hosted price. The 1M-context default uses ~27% of V3.2's single-token FLOPs and ~10% of its KV cache thanks to architecture changes. MIT-licensed, but not realistically a local pick at this size.
When not to use: Local hardware budgets under 8× H100 (~$200K). At Q4 the weights alone exceed 800 GB. Use the DeepSeek API or a hosted route — Together, OpenRouter, or DeepSeek's own endpoints — and let the smaller V4-Flash sibling fill multi-GPU local roles.

Runner notes

Hosted via DeepSeek API at materially lower prices than GPT-5.4 / Sonnet 4.6. Unsloth dynamic GGUFs exist (`unsloth/DeepSeek-V4-Pro`) but the practical home is API. 1M context is the default across all official services.

License: MIT
Released: April 24, 2026 (preview)
Maker: DeepSeek
Model card: huggingface.co/deepseek-ai/DeepSeek-V4-Pro →

Next step

Find-by-model — see what hardware runs this→