the AI bench
VERIFIED JUNE 2026
All models

MODEL · ALIBABA · 4B DENSE

Qwen 3.5 4B

The low-tier sweet spot — fits 6 GB VRAM at Q4, strong multimodal and agentic tool-use for its size, supports both thinking and non-thinking modes.

License: Apache 2.0 · Context: 262K native · Released: March 2, 2026

The decision in five lines

The call
Skip for local — for coding
Best for
coding · chat · docs
Runs on
23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out
Long-context retrieval or agentic coding against real codebases — it handles tokens fine but judgment degrades at 4B.
Evidence
Estimated · last verified April 2026

4B dense
PARAMETERS
DENSE
TYPE
262K
CONTEXT
~2.5 GB
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CODING · LOW
Qwen 3.5 4B4B dense with 262K context; surprisingly coherent for its size.
CHAT · LOW
Qwen 3.5 4B4B dense with 262K context and native multimodal.
DOCS · LOW
Qwen 3.5 4B + tight RAG4B plus tight chunking; keep context windows small.

The call

The low-tier sweet spot — fits 6 GB VRAM at Q4, strong multimodal and agentic tool-use for its size, supports both thinking and non-thinking modes.

When not to use: Long-context retrieval or agentic coding against real codebases — it handles tokens fine but judgment degrades at 4B. Step to 9B or 14B.

Runner notes

Ollama tag `qwen3.5:4b` (Q4_K_M default). Works cleanly on CPU (AVX-512 preferred), MLX, and ROCm. No HIP issues at dense sizes.

License
Apache 2.0
Released
March 2, 2026
Maker
Alibaba

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this