MODEL · ALIBABA (QWEN) · 0.6B / 4B / 8B
Qwen3-Embedding (0.6B / 4B / 8B)
Qwen's embedding family — the 8B ranks #1 overall on MTEB Multilingual as of 2026, making the line the current best-quality open retrieval pick and displacing BGE-M3 at the top. Apache 2.0, three sizes so you can trade quality for footprint, with a matching Qwen3-Reranker family for two-stage retrieval.
License: Apache 2.0 · Context: 32K · Released: June 2025
The decision in five lines
- The call
- Consider — for docs
- Best for
- docs
- Runs on
- 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
- Watch out
- Ultra-cheap or CPU-only retrieval at scale, or maximum language breadth per byte — BGE-M3 (568M, 170+ languages) or nomic-embed stay lighter.
- Evidence
- Estimated
- 0.6B
- PARAMETERS
- EMBEDDING
- TYPE
- 32K
- CONTEXT
- ~1 GB (0.6B) / ~4 GB (4B) / ~8 GB (8B) at fp16
- VRAM AT Q4
Where we recommend this
Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.
The call
Qwen's embedding family — the 8B ranks #1 overall on MTEB Multilingual as of 2026, making the line the current best-quality open retrieval pick and displacing BGE-M3 at the top. Apache 2.0, three sizes so you can trade quality for footprint, with a matching Qwen3-Reranker family for two-stage retrieval.
When not to use: Ultra-cheap or CPU-only retrieval at scale, or maximum language breadth per byte — BGE-M3 (568M, 170+ languages) or nomic-embed stay lighter. Choose by whether you need top MTEB quality (Qwen3-Embedding) or minimum footprint (BGE-M3).
Runner notes
sentence-transformers / FlagEmbedding / vLLM. `Qwen/Qwen3-Embedding-8B` for top quality, `-4B` / `-0.6B` for lighter rigs. Pair with `Qwen/Qwen3-Reranker-*` for a rerank stage. Instruction-aware — prefix queries with a task instruction for best results.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Intel Arc B580 12 GBPerfect · 5.9× 12 GB · $249–$299
- NVIDIA RTX 3060 12 GBPerfect · 5.9× 12 GB · $280–$400
- Minisforum UM890 ProPerfect · 11.8× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBPerfect · 7.9× 16 GB · $560–$610
- AMD Radeon RX 9070 XTPerfect · 7.9× 16 GB · $649–$779
- Mac Mini M4 16 GBPerfect · 5.3× 16 GB unified · $799 (new floor) / $499–$599 (eBay/residuals)
- AMD Radeon RX 7900 XTXPerfect · 11.8× 24 GB · $810 used / ~$1,340 new
- NVIDIA RTX 3090 (used, single)Perfect · 11.8× 24 GB · $950–$1,200
- NVIDIA RTX 5070 TiPerfect · 7.9× 16 GB · $980–$1,300
- NVIDIA RTX 5080Perfect · 7.9× 16 GB · $999–$1,400
- MacBook Air M5 24 GBPerfect · 7.9× 24 GB unified · $1,499–$1,899
- Mac Mini M4 Pro 24 GBPerfect · 7.9× 24 GB unified · $1,599
- Dual RTX 3090 (used)Perfect · 23.6× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 42.1× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 11.8× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 15.8× 48 GB unified · $2,999–$3,599
- NVIDIA RTX 5090Perfect · 15.7× 32 GB · $3,500–$4,300
- NVIDIA RTX A6000 (48 GB, used)Perfect · 23.6× 48 GB ECC · $3,500–$4,500
- Mac Studio M4 Max 64 GBPerfect · 21.0× 64 GB unified · $3,799
- NVIDIA DGX SparkPerfect · 42.1× 128 GB unified · $4,699
- M5 Max MacBook Pro 64 GBPerfect · 21.0× 64 GB unified · ~$5,199 (est.; June 25 2026 increase)
- Mac Studio M3 Ultra 96 GBPerfect · 31.6× 96 GB unified · $5,299
- Dual RTX 5090Perfect · 31.4× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→