MODEL · OPENMOSS / MOSI.AI · 11B
MOSS-VL-0408 (Base + Instruct)
OpenMOSS's vision-language entry, sized for serious video understanding — up to 256 video frames per inference, ~201M pixel budget per image or per video, 16×16 patch size, interleaved image+text+video sequences. SFT-tuned from `MOSS-VL-Base-0408`. Native BF16. Distinct from the Qwen3-VL line and parallel to MiniCPM-V — pick this when you need long-video context, not single-frame OCR.
License: Apache 2.0 · Context: inherits 11B backbone · Released: April 22, 2026
The decision in five lines
- The call
- Consider — runnable locally, family reference
- Best for
- Local evaluation and family reference
- Runs on
- 20 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
- Watch out
- Single-image OCR / lightweight visual reasoning — MiniCPM-V-4.6 at 1B is ~10× smaller and competitive on still-image tasks.
- Evidence
- Estimated
- 11B
- PARAMETERS
- VISION-LANGUAGE WITH VIDEO
- TYPE
- inherits
- CONTEXT
- ~14–18 GB (Q4 est.) / ~22 GB (BF16)
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
OpenMOSS's vision-language entry, sized for serious video understanding — up to 256 video frames per inference, ~201M pixel budget per image or per video, 16×16 patch size, interleaved image+text+video sequences. SFT-tuned from `MOSS-VL-Base-0408`. Native BF16. Distinct from the Qwen3-VL line and parallel to MiniCPM-V — pick this when you need long-video context, not single-frame OCR.
When not to use: Single-image OCR / lightweight visual reasoning — MiniCPM-V-4.6 at 1B is ~10× smaller and competitive on still-image tasks. Real-time streaming video — MOSS-VL is offline-batch by design (`offline_image_generate` / `offline_video_generate` / `offline_batch_generate` APIs).
Runner notes
Python 3.12 + Flash Attention 2 required. `pip install -i https://pypi.org/simple --no-build-isolation -r requirements.txt`. No GGUF / Ollama path yet. `model.offline_video_generate(processor, prompt, video_path, max_frames=256, video_fps=1.0)` for video inference.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Minisforum UM890 ProPerfect · 1.5× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBTight · 1.0× 16 GB · $560–$610
- AMD Radeon RX 9070 XTTight · 1.0× 16 GB · $649–$849
- AMD Radeon RX 7900 XTXPerfect · 1.5× 24 GB · $770–$1,400
- NVIDIA RTX 3090 (used, single)Perfect · 1.5× 24 GB · $800–$1,000
- NVIDIA RTX 5070 TiTight · 1.0× 16 GB · $980–$1,300
- NVIDIA RTX 5080Tight · 1.0× 16 GB · $999–$1,250
- MacBook Air M5 24 GBTight · 1.0× 24 GB unified · $1,299–$1,699
- Mac Mini M4 Pro 24 GBTight · 1.0× 24 GB unified · $1,399
- Dual RTX 3090 (used)Perfect · 3.1× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 5.5× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 1.5× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 2.1× 48 GB unified · $2,599–$3,099
- Mac Studio M4 Max 64 GBPerfect · 2.8× 64 GB unified · $3,199
- NVIDIA RTX A6000 (48 GB, used)Perfect · 3.1× 48 GB ECC · $3,500–$4,500
- NVIDIA RTX 5090Perfect · 2.1× 32 GB · $3,800–$4,100
- Mac Studio M3 Ultra 96 GBPerfect · 4.1× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBPerfect · 2.8× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 5.5× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 4.1× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→