MOSS-VL-0408 (Base + Instruct)

OpenMOSS's vision-language entry, sized for serious video understanding — up to 256 video frames per inference, ~201M pixel budget per image or per video, 16×16 patch size, interleaved image+text+video sequences. SFT-tuned from `MOSS-VL-Base-0408`. Native BF16. Distinct from the Qwen3-VL line and parallel to MiniCPM-V — pick this when you need long-video context, not single-frame OCR.

License: Apache 2.0 · Context: inherits 11B backbone · Released: April 22, 2026

The decision in five lines

The call: Consider — runnable locally, family reference
Best for: Local evaluation and family reference
Runs on: 20 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out: Single-image OCR / lightweight visual reasoning — MiniCPM-V-4.6 at 1B is ~10× smaller and competitive on still-image tasks.
Evidence: Estimated · last verified June 2026

11B: PARAMETERS
VISION-LANGUAGE WITH VIDEO: TYPE
inherits: CONTEXT
~14–18 GB (Q4 est.) / ~22 GB (BF16): VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

OpenMOSS's vision-language entry, sized for serious video understanding — up to 256 video frames per inference, ~201M pixel budget per image or per video, 16×16 patch size, interleaved image+text+video sequences. SFT-tuned from `MOSS-VL-Base-0408`. Native BF16. Distinct from the Qwen3-VL line and parallel to MiniCPM-V — pick this when you need long-video context, not single-frame OCR.
When not to use: Single-image OCR / lightweight visual reasoning — MiniCPM-V-4.6 at 1B is ~10× smaller and competitive on still-image tasks. Real-time streaming video — MOSS-VL is offline-batch by design (`offline_image_generate` / `offline_video_generate` / `offline_batch_generate` APIs).

Runner notes

Python 3.12 + Flash Attention 2 required. `pip install -i https://pypi.org/simple --no-build-isolation -r requirements.txt`. No GGUF / Ollama path yet. `model.offline_video_generate(processor, prompt, video_path, max_frames=256, video_fps=1.0)` for video inference.

License: Apache 2.0
Released: April 22, 2026
Maker: OpenMOSS / MOSI.AI
Model card: huggingface.co/OpenMOSS-Team/MOSS-VL-Instruct-0408 →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→