Qwen3-Embedding (0.6B / 4B / 8B)

Qwen's embedding family — the 8B ranks #1 overall on MTEB Multilingual as of 2026, making the line the current best-quality open retrieval pick and displacing BGE-M3 at the top. Apache 2.0, three sizes so you can trade quality for footprint, with a matching Qwen3-Reranker family for two-stage retrieval.

License: Apache 2.0 · Context: 32K · Released: June 2025

The decision in five lines

The call: Consider — for docs
Best for: docs
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: Ultra-cheap or CPU-only retrieval at scale, or maximum language breadth per byte — BGE-M3 (568M, 170+ languages) or nomic-embed stay lighter.
Evidence: Estimated · last verified July 2026

0.6B: PARAMETERS
EMBEDDING: TYPE
32K: CONTEXT
~1 GB (0.6B) / ~4 GB (4B) / ~8 GB (8B) at fp16: VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

DOCS · MID

Qwen3-Embedding-8B (Apache 2.0)#1 on MTEB overall as of 2026; the current best-quality open retrieval pick. Use BGE-M3 (568M) instead when you want cheap, broad multilingual breadth or CPU-only.

The call

Qwen's embedding family — the 8B ranks #1 overall on MTEB Multilingual as of 2026, making the line the current best-quality open retrieval pick and displacing BGE-M3 at the top. Apache 2.0, three sizes so you can trade quality for footprint, with a matching Qwen3-Reranker family for two-stage retrieval.
When not to use: Ultra-cheap or CPU-only retrieval at scale, or maximum language breadth per byte — BGE-M3 (568M, 170+ languages) or nomic-embed stay lighter. Choose by whether you need top MTEB quality (Qwen3-Embedding) or minimum footprint (BGE-M3).

Runner notes

sentence-transformers / FlagEmbedding / vLLM. `Qwen/Qwen3-Embedding-8B` for top quality, `-4B` / `-0.6B` for lighter rigs. Pair with `Qwen/Qwen3-Reranker-*` for a rerank stage. Instruction-aware — prefix queries with a task instruction for best results.

License: Apache 2.0
Released: June 2025
Maker: Alibaba (Qwen)
Model card: huggingface.co/Qwen/Qwen3-Embedding-8B →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→