MODEL · BAIDU · COMPACT OCR VISION-LANGUAGE MODEL
Baidu Unlimited-OCR
Baidu's open-weight OCR / document-understanding VLM, positioned as a step beyond DeepSeek-OCR — high-fidelity text + layout extraction from images and documents. MIT-licensed, with vLLM support (community-added), an arXiv paper (2606.23050), and a live Hugging Face Space demo. Strong early traction (400K+ downloads within days).
License: MIT · Context: Document-page scale · Released: June 22, 2026
The decision in five lines
- The call
- Skip for local
- Best for
- Local evaluation and family reference
- Runs on
- No planner hardware fits at default quant — see model card
- Watch out
- General chat or reasoning — this is a document/OCR specialist, not a generalist VLM.
- Evidence
- Editorial
- Compact OCR vision-language model
- PARAMETERS
- OCR / DOCUMENT VISION-LANGUAGE MODEL
- TYPE
- Document-page
- CONTEXT
- Runs on a single consumer GPU (compact VLM)
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
Baidu's open-weight OCR / document-understanding VLM, positioned as a step beyond DeepSeek-OCR — high-fidelity text + layout extraction from images and documents. MIT-licensed, with vLLM support (community-added), an arXiv paper (2606.23050), and a live Hugging Face Space demo. Strong early traction (400K+ downloads within days).
When not to use: General chat or reasoning — this is a document/OCR specialist, not a generalist VLM. For broad vision-language work use MiniCPM-V-4.6 or MiniCPM-o 4.5.
Runner notes
vLLM inference supported (community PRs); also on ModelScope. GitHub `baidu/Unlimited-OCR` for the pipeline. MIT — clean for commercial document-processing use.
Next step
Find-by-model — see what hardware runs this→