the AI bench
VERIFIED JUNE 2026
All models

MODEL · ALIBABA · 35B TOTAL / 3B ACTIVE

Qwen 3.5 35B-A3B

The 24 GB-VRAM unlock — dense-27B quality at 3B-active speed, and the community workhorse for mixed coding / chat / docs where breadth matters. 256 experts with 8 routed + 1 shared per token.

License: Apache 2.0 · Context: 262K native, extendable to ~1M via YaRN · Released: February 16, 2026

The decision in five lines

The call
Buy — for coding
Best for
coding · chat · docs · agents
Runs on
16 hardware picks fit (cheapest: Minisforum UM890 Pro · $463)
Watch out
AMD ROCm on older llama.cpp builds (MoE HIP kernels had stability issues through 2025; check release notes before pulling).
Evidence
Measured · last verified April 2026

35B total
PARAMETERS
MOE
TYPE
262K
CONTEXT
~17 GB
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CODING · TOP
Qwen 3.5 35B-A3B (generalist MoE)Often beats the Coder variant on mixed real-world codebases per community testing; Apache 2.0.
CODING · HIGH
Qwen 3.5 35B-A3B (generalist MoE)Often wins real mixed-codebase work over the Coder variant; Apache 2.0.
CHAT · HIGH
Qwen 3.5 35B-A3B (MoE, fits 24GB)3B active MoE — 30B quality at 3B inference speed.
DOCS · TOP
Qwen 3.5 35B-A3B + RAGMoE plus proper RAG beats brute-force long context for most real docs work.
DOCS · HIGH
Qwen 3.5 35B-A3B + RAGMoE + RAG combo fits 24GB and handles chunked retrieval well.
AGENTS · TOP
Qwen 3.5 35B-A3B (generalist MoE)Apache 2.0 generalist with native tool use; 262K context for long agentic loops; 10B-active speed at 35B-class quality.
AGENTS · HIGH
Qwen 3.5 35B-A3B (MoE, fits 24GB)MoE with native tool use; fits 24GB at Q4; Apache 2.0.

The call

The 24 GB-VRAM unlock — dense-27B quality at 3B-active speed, and the community workhorse for mixed coding / chat / docs where breadth matters. 256 experts with 8 routed + 1 shared per token.

When not to use: AMD ROCm on older llama.cpp builds (MoE HIP kernels had stability issues through 2025; check release notes before pulling). Also skip for pure coding where Qwen3-Coder-30B-A3B is sharper.

Runner notes

Ollama tag `qwen3.5:35b` (the bare 35b tag is the A3B MoE; `:35b-a3b` does not resolve). Q4 fits a single 24 GB card with context headroom. MoE shines on vLLM/SGLang; Ollama works but slower on MoE routing than dedicated engines.

License
Apache 2.0
Released
February 16, 2026
Maker
Alibaba

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this