Qwen 3.5 35B-A3B

The 24 GB-VRAM unlock — dense-27B quality at 3B-active speed, and the community workhorse for mixed coding / chat / docs where breadth matters. 256 experts with 8 routed + 1 shared per token.

License: Apache 2.0 · Context: 262K native, extendable to ~1M via YaRN · Released: February 16, 2026

The decision in five lines

The call: Buy — for coding
Best for: coding · chat · docs · agents
Runs on: 16 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out: AMD ROCm on older llama.cpp builds (MoE HIP kernels had stability issues through 2025; check release notes before pulling).
Evidence: Measured · last verified July 2026

35B total: PARAMETERS
MOE: TYPE
262K: CONTEXT
~17 GB: VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CODING · TOP

Qwen 3.5 35B-A3B (generalist MoE)Often beats the Coder variant on mixed real-world codebases per community testing; Apache 2.0.

CODING · HIGH

Qwen 3.5 35B-A3B (generalist MoE)Often wins real mixed-codebase work over the Coder variant; Apache 2.0.

CHAT · HIGH

Qwen 3.5 35B-A3B (MoE, fits 24GB)3B active MoE — 30B quality at 3B inference speed.

DOCS · TOP

Qwen 3.5 35B-A3B + RAGMoE plus proper RAG beats brute-force long context for most real docs work.

DOCS · HIGH

Qwen 3.5 35B-A3B + RAGMoE + RAG combo fits 24GB and handles chunked retrieval well.

AGENTS · TOP

Qwen 3.5 35B-A3B (generalist MoE)Apache 2.0 generalist with native tool use; 262K context for long agentic loops; 10B-active speed at 35B-class quality.

AGENTS · HIGH

Qwen 3.5 35B-A3B (MoE, fits 24GB)MoE with native tool use; fits 24GB at Q4; Apache 2.0.

The call

The 24 GB-VRAM unlock — dense-27B quality at 3B-active speed, and the community workhorse for mixed coding / chat / docs where breadth matters. 256 experts with 8 routed + 1 shared per token.
When not to use: AMD ROCm on older llama.cpp builds (MoE HIP kernels had stability issues through 2025; check release notes before pulling). Also skip for pure coding where Qwen3-Coder-30B-A3B is sharper.

Runner notes

Ollama tag `qwen3.5:35b` (the bare 35b tag is the A3B MoE; `:35b-a3b` does not resolve). Q4 fits a single 24 GB card with context headroom. MoE shines on vLLM/SGLang; Ollama works but slower on MoE routing than dedicated engines.

License: Apache 2.0
Released: February 16, 2026
Maker: Alibaba
Model card: huggingface.co/Qwen/Qwen3.5-35B-A3B →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→