Qwen 3.5 4B

The low-tier sweet spot — fits 6 GB VRAM at Q4, strong multimodal and agentic tool-use for its size, supports both thinking and non-thinking modes.

License: Apache 2.0 · Context: 262K native · Released: March 2, 2026

The decision in five lines

The call: Skip for local — for coding
Best for: coding · chat · docs
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: Long-context retrieval or agentic coding against real codebases — it handles tokens fine but judgment degrades at 4B.
Evidence: Estimated · last verified July 2026

4B dense: PARAMETERS
DENSE: TYPE
262K: CONTEXT
~2.5 GB: VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CODING · LOW

Qwen 3.5 4B4B dense with 262K context; surprisingly coherent for its size.

CHAT · LOW

Qwen 3.5 4B4B dense with 262K context and native multimodal.

DOCS · LOW

Qwen 3.5 4B + tight RAG4B plus tight chunking; keep context windows small.

The call

The low-tier sweet spot — fits 6 GB VRAM at Q4, strong multimodal and agentic tool-use for its size, supports both thinking and non-thinking modes.
When not to use: Long-context retrieval or agentic coding against real codebases — it handles tokens fine but judgment degrades at 4B. Step to 9B or 14B.

Runner notes

Ollama tag `qwen3.5:4b` (Q4_K_M default). Works cleanly on CPU (AVX-512 preferred), MLX, and ROCm. No HIP issues at dense sizes.

License: Apache 2.0
Released: March 2, 2026
Maker: Alibaba
Model card: huggingface.co/Qwen/Qwen3.5-4B →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→