VERIFIED JULY 2026

MODELS · 66 CURATED OPEN-WEIGHT PICKS

Every model we recommend.

Dated, opinionated, license-audited. Machine-readable by design.

Canonical detail page per base model. Which tier slots it fills, license gotchas, runner friction, and which hardware actually runs it — pulled live from the planner so this index stays current when picks shift.

Filter by use case

Alibaba Qwen — the most-used open family in 2026

ALIBABA35B total / 3B active

Qwen 3.6-35B-A3B

Alibaba's post-3.5 refresh specifically targeting agentic coding — claims to beat dense Qwen 3.5 27B and Gemma 4 31B on coding + reasoning at the same active-param budget. Fully multimodal.

chat · agentsEstimatedRead →

ALIBABA27B dense

Qwen 3.6-27B

The April 2026 dense refresh that supersedes Qwen 3.5 27B — claims to beat the prior 397B MoE flagship on coding benchmarks while staying single-GPU deployable at Q4. The current dense top pick for 24 GB rigs and Mac 32+ GB.

chat · docsEstimatedRead →

ALIBABA35B total / 3B active

Qwen 3.5 35B-A3B

The 24 GB-VRAM unlock — dense-27B quality at 3B-active speed, and the community workhorse for mixed coding / chat / docs where breadth matters. 256 experts with 8 routed + 1 shared per token.

coding · chat · docs · agentsMeasuredRead →

ALIBABA27B dense

Qwen 3.5 27B

A dense, natively multimodal (text + image + video input) mid-large generalist — the biggest non-MoE in the Qwen 3.5 medium line. Best realistic pick for long-context docs on 24 GB VRAM or Mac 48 GB+.

Reference entryEstimatedRead →

ALIBABA9B dense

Qwen 3.5 9B

Strongest "runs on a mid-tier GPU" model in the Qwen 3.5 small line — supports thinking mode and 201-language coverage. Fits 8 GB VRAM with headroom.

coding · chat · docs · agentsEstimatedRead →

ALIBABA4B dense

Qwen 3.5 4B

The low-tier sweet spot — fits 6 GB VRAM at Q4, strong multimodal and agentic tool-use for its size, supports both thinking and non-thinking modes.

coding · chat · docsEstimatedRead →

ALIBABA2B dense

Qwen 3.5 2B

Phone-class multimodal model built on the same Qwen 3.5 foundation as the medium tier. Non-thinking by default. Runs anywhere, including CPU-only setups.

codingEstimatedRead →

ALIBABA30.5B total / 3.3B active

Qwen3-Coder-30B-A3B

The community daily-driver coding MoE for 24 GB-class hardware — purpose-trained for agentic coding + browser-use. Delivers 30B-dense quality at 3B-dense throughput.

coding · agentsMeasuredRead →

ALIBABA480B total / 35B active

Qwen3-Coder-480B-A35B

Frontier open-weight agentic coding model — claimed on par with Claude Sonnet on agentic benchmarks. Alibaba's most powerful coder.

Reference entryEstimatedRead →

ALIBABA30B total / 3B active

Qwen3-Omni-30B-A3B-Instruct

The only locally-runnable open-weight model that does real-time streaming speech-out natively. 119 input languages, 10 speech-output languages (two voices: Chelsie, Ethan).

voiceEstimatedRead →

ALIBABA14B dense

Qwen3-14B

Last-generation Qwen3 14B dense with thinking mode enabled by default and strong tool-calling. Still a solid 16 GB-VRAM pick when you want dense behaviour over MoE. Qwen 3.5 skipped the 14B slot.

codingMeasuredRead →

ALIBABA20B MMDiT (original Qwen-Image lineage)

Qwen-Image-2512 (20B) + Edit-2511

Strongest open-weight image model for text rendering — Arabic, Chinese, English all sharp. Qwen-Image-2512 (Dec 31 2025) is the latest released generation model and claims #1 open-source on AI Arena; pair with Qwen-Image-Edit-2511 (Dec 23 2025) for editing workflows. A unified 7B "Qwen-Image 2.0" was announced Feb 10 2026 (arxiv 2605.10730 tech report) but the weights are not yet open-sourced — Qwen-Image-2512 remains the runnable flagship as of July 2026.

imageEstimatedRead →

Other frontier and mid-tier text

COHERE218B total / 25B active (128 experts, 8 active + 1 shared per token)

Command A+ (218B-A25B)

Cohere's frontier-class MoE: 218B params with 25B active per token, hybrid sliding-window + global attention, native vision + 48-language coverage. The first Apache-2.0 frontier MoE you can actually serve on 2× H100 — same hardware class as DeepSeek V4-Pro and Kimi K2.6 but with a permissive license neither of those carries.

docs · agentsEstimatedRead →

COHERE / COHERE LABS30B total / 3B active (sparse MoE, 128 experts / 8 active per token)

North Mini Code (30B-A3B)

Cohere's open-weight 30B-A3B coder, tuned for code generation, agentic software engineering, and terminal tasks. Same 3B-active MoE shape and 24 GB-tier fit as Qwen3-Coder-30B-A3B, but from a Western lab under a clean Apache 2.0 license, with a larger 256K context. Cohere reports strong SWE-Bench Verified/Pro and Terminal-Bench v2 numbers — vendor figures, not yet independently reproduced.

codingEstimatedRead →

POOLSIDE33B total / ~3B active (sparse MoE), trained from scratch on 30T tokens

Laguna XS 2.1 (33B-A3B)

The first open-weight release from Poolside, a coding-first lab that trains from scratch rather than fine-tuning another base. A 33B-total / ~3B-active MoE aimed at agentic software engineering — the same big-but-sparse shape that made Qwen3-Coder-30B-A3B the community daily driver. Ships with official BF16 / FP8 / INT4 / NVFP4 checkpoints, an official GGUF repo, and DFlash speculator models that Poolside claims roughly double local decode speed.

Reference entryEstimatedRead →

INTERNSCIENCE (SHANGHAI AI LAB)35B total / 3B active (MoE flagship) · 4.5B dense (Agents-A1-4B)

Agents-A1 (35B-A3B + 4B)

An agent-specialist family post-trained on Qwen3.5 bases for long-horizon search, engineering, scientific research, instruction following, and tool calling. The pitch is agent-horizon scaling rather than parameter scaling: InternScience reports the 35B-A3B matching much larger models on agentic benchmarks, and the 4B — released July 14 — posting large gains over its own Qwen3.5-4B base (BrowseComp 66.8 vs 47.2, MatTools 49.3 vs 10.9). Unusually for a lab release, the evaluation framework is open-sourced so the numbers are reproducible.

Reference entryEstimatedRead →

DEEPREINFORCE AI9B dense / 31B dense / 35B-A3B MoE / 397B MoE — post-trained on Gemma 4 + Qwen 3.5 bases

Ornith-1.0 (9B / 31B / 35B-MoE / 397B-MoE)

A self-improving open-source family of agentic coding models, post-trained on Gemma 4 and Qwen 3.5 bases. DeepReinforce reports SOTA among comparable open models on Terminal-Bench 2.1, SWE-Bench (Verified/Pro/Multilingual), NL2Repo, and OpenClaw. The training idea: RL that jointly optimizes both the solution rollout and the scaffold that drives it. MIT, no regional limits. The 9B / 31B / 35B are single-GPU-deployable; the 397B MoE is hosted / big-iron.

Reference entryEstimatedRead →

WEIBO AI3B dense

VibeThinker-3B

A 3B reasoning model that punches far above its size on VERIFIABLE reasoning — math, competitive coding, STEM. Weibo AI reports it reaching the range of much larger models (DeepSeek V3.2, GLM-5, Kimi K2.5) on IMO-AnswerBench (76.4, → 80.6 with a test-time verification strategy) despite only 3B params, via their Spectrum-to-Signal post-training. The thesis: verifiable reasoning is a parameter-dense, compressible capability where small models can reach near-frontier.

Reference entryEstimatedRead →

GOOGLE DEEPMIND26B total / ~4B active (8 of 128 experts)

DiffusionGemma 26B-A4B

Google DeepMind's discrete-diffusion take on Gemma 4 — instead of token-by-token autoregression, it denoises blocks of tokens ("canvases") in parallel for higher tokens/sec, on the 26B-A4B MoE Gemma 4 foundation. Multimodal input (text + image + video), a thinking mode, and an encoder-decoder design (cached AR prompt encoder + bidirectional diffusion decoder). Apache 2.0, engineered for low-latency single-accelerator local inference. A genuinely different generation architecture, not a version bump.

Reference entryEstimatedRead →

Z.AI (FORMERLY ZHIPU AI)744B total / 40B active

GLM-5.1

Long-horizon agentic coding flagship from the GLM-5 line — tops SWE-Bench Pro at 58.4, narrowly beating GPT-5.4 and Claude Opus 4.6 on that benchmark, under a clean MIT license. Superseded June 16, 2026 by GLM-5.2 (62.1 SWE-Bench Pro, a solid 1M context, same MIT license) — see the GLM-5.2 fast take; both stay hosted / big-iron, so neither is a local pick.

agentsEstimatedRead →

MOONSHOT AI1T total / 32B active (384 experts; 8 routed + 1 shared per token)

Kimi K2.6

Moonshot's 1T MoE agentic coder — keeps the K2 architecture (61 layers, 64 attention heads, MLA) and extends context to 256K. Tops SWE-Bench Pro at 58.6 (vs GPT-5.4 xhigh 57.7, Opus 4.6 max 53.4) and lands #4 on the Artificial Analysis Intelligence Index. The real differentiator is Agent Swarm — 300 sub-agents over 4,000 coordinated steps, a different shape of capability than single-step quality.

coding · agentsEstimatedRead →

DEEPSEEK1.6T total / 49B active (MoE)

DeepSeek V4-Pro

DeepSeek's frontier-class V4 flagship — 1.6T MoE that matches GPT-5.4 and Sonnet 4.6 on most benchmarks at meaningfully lower hosted price. The 1M-context default uses ~27% of V3.2's single-token FLOPs and ~10% of its KV cache thanks to architecture changes. MIT-licensed, but not realistically a local pick at this size.

Reference entryEstimatedRead →

DEEPSEEK284B total / 13B active (MoE)

DeepSeek V4-Flash

The smaller half of the V4 family — 284B MoE with 13B active per token. Same 1M context, same MIT license, same architectural KV-cache improvements as V4-Pro. The honest local pick of the V4 line: still frontier-class on most benchmarks, but realistically deployable only on M3 Ultra 192GB unified or dual 80GB server cards.

codingEstimatedRead →

GOOGLE31B dense / 26B total + 3.8B active (MoE) / 12B dense

Gemma 4 (31B dense + 26B A4B MoE + 12B multimodal)

Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".

chat · docsMeasuredRead →

GOOGLE4B / 12B / 27B dense

Gemma 3 (4B / 12B / 27B)

Previous-generation Gemma line. The 4B is still a useful ultra-compact agent model with native vision. Larger sizes are superseded by Gemma 4.

agentsEstimatedRead →

MISTRAL AI128B dense (folds Magistral reasoning + Devstral 2 coding into one weight set)

Mistral Medium 3.5 128B

Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).

Reference entryEstimatedRead →

MISTRAL AI3B / 8B / 14B dense (all with image understanding)

Ministral 3 family (3B / 8B / 14B)

Mistral's clean Apache-2.0 edge family with Base / Instruct / Reasoning splits per size. The "no-license-drama" alternative to Qwen or Gemma when lawyers are involved.

chat · docs · agentsEstimatedRead →

OPENAI21B total / 3.6B active

gpt-oss-20b

OpenAI's open-weights MoE. Matches o3-mini on common benchmarks, post-trained with MXFP4 quantization so it lands in 16 GB VRAM — a near-frontier reasoner you can actually run on a 5060 Ti.

coding · chat · agentsMeasuredRead →

IBM RESEARCH3B / 8B / 30B dense (instruct + base each)

IBM Granite 4.1

IBM's refreshed open-weights enterprise family — three dense decoder-only sizes, Apache 2.0, trained on ~15T tokens with progressive annealing toward technical/scientific/mathematical data plus instruction-following. The 8B instruct claims to match the prior Granite 4.0 32B-A9B MoE flagship on IBM's own benchmarks; cross-vendor comparison (vs Qwen/Gemma/Mistral) is unverified at time of publication.

Reference entryEstimatedRead →

IBM GRANITE8B base + 12 embedded LoRA adapters (~10B total)

Granite-Switch 4.1 8B Preview (12 task LoRAs)

IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.

Reference entryEstimatedRead →

MICROSOFT3.8B dense

Phi-4 Mini

Microsoft's dense 3.8B instruct model built on synthetic + filtered web data. Punches above its weight on reasoning-heavy prompts in the <5B bracket. MIT license is unusually clean for commercial redistribution.

coding · chat · docs · agentsEstimatedRead →

MINIMAX~229B total / ~10B active (MoE, interleaved thinking)

MiniMax M2.5 / M2.7

MiniMax's open-weights agentic-workflow family — strong on coding + tool-use Arena. M2.7 is the first model that "participates in its own evolution" via self-iterated RL. Frontier-class, not a local workhorse.

Reference entryEstimatedRead →

OPENBMB0.5B / 1B / 3B / 8B (all natively trained in 1.58-bit ternary; not post-hoc quantized)

BitCPM4-CANN family (0.5B / 1B / 3B / 8B, native 1.58-bit)

First publicly reported end-to-end 1.58-bit (ternary {-1, 0, 1}) training stack at 8B scale. Trained natively at 1.58-bit via Quantization-Aware Training + Straight-Through Estimator on Huawei Ascend NPU — not a post-hoc PTQ pass over a BF16 model. The 8B model retains 95.7% of full-precision MiniCPM4 performance at ~6× memory reduction; 0.5B retains 90.1%. The new low-VRAM tier ceiling.

Reference entryEstimatedRead →

OPENBMB1.08B total / 679M non-embedding (LlamaForCausalLM)

MiniCPM5-1B (Apache 2.0, OPD-trained)

OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.

Reference entryEstimatedRead →

Image generation

BLACK FOREST LABS32B

FLUX.2 [dev]

BFL's frontier open-weights T2I — best-in-class prompt adherence and text rendering for any license-flexible open model in April 2026.

imageEstimatedRead →

BLACK FOREST LABS4B (step-distilled, ~4 inference steps) · 9B (parent flow model)

FLUX.2 [klein] (4B + 9B)

FLUX distilled for fast inference. The 4B variant is Apache 2.0 — the first FLUX-quality image model you can actually ship in a commercial product.

imageEstimatedRead →

HIDREAM8B dense (pixel-space; no VAE, no disjoint text encoder)

HiDream-O1-Image (8B)

HiDream's next-generation image foundation model. The architectural story: pixel-space generation without an external VAE or disjoint text encoder — one Pixel-level Unified Transformer handles text-to-image, image editing, subject-driven personalization, and storyboarding in a single weight set. Debuted at #8 on Artificial Analysis T2I Arena at launch. Supersedes HiDream-I1 for the MIT-license open-weight slot.

imageEstimatedRead →

MICROSOFT3.8B (MMDiT 48-block + FLUX.2 semantic VAE + multi-layer GPT-OSS text features)

Microsoft Lens (3.8B, MIT)

Microsoft's first foundational text-to-image model. Three-step ladder: `Lens-Base` (50-step supervised), `Lens` (20-step RL-tuned default), `Lens-Turbo` (4-step distilled). Architecture is novel: an MMDiT trunk paired with FLUX.2's semantic VAE and multi-layer features from a frozen GPT-OSS text model — Microsoft's public framing claims competitive quality at "substantially less training compute than larger T2I models." Cleanest MIT-licensed T2I at this param count.

imageEstimatedRead →

HIDREAM17B (same across all three; Dev and Fast are step-distilled, not pruned)

HiDream-I1 (Full / Dev / Fast)

17B MMDiT open-weights image foundation, MIT-licensed, SOTA-at-release. Full/Dev/Fast ladder distilled down the step count. The MIT license is a big unlock for production commercial pipelines vs FLUX dev.

imageEstimatedRead →

TONGYI-MAI (ALIBABA)6B (single-stream DiT, Decoupled-DMD distillation)

Z-Image-Turbo

Alibaba Tongyi's distilled 6B T2I that matches FLUX.2-dev quality in 8 steps. Bilingual English + Chinese text rendering. The community daily driver for Apache-2.0 image gen in 2026.

imageEstimatedRead →

NVLABS + MIT HAN LAB0.6B / 1.6B (linear-attention DiT)

SANA (0.6B / 1.6B)

NVLabs + MIT Han Lab's linear-attention diffusion transformer. Fastest image generation at any given quality tier in its class — 23–39× faster than FLUX-dev on the same hardware.

imageEstimatedRead →

STABILITY AI2.5B (MMDiT-X)

Stable Diffusion 3.5 Medium

Stability's consumer-friendly MMDiT-X text-to-image. Designed to run on consumer GPUs, mature ecosystem with thousands of LoRAs and ControlNets. The community "safe SD default."

imageEstimatedRead →

Voice — TTS, STT, and multimodal

HEXGRAD82M (StyleTTS2-based)

Kokoro-82M

An 82M TTS model that punches absurdly above its weight — ranked #1 on TTS Arena against 7B+ models, runs fine on CPU. The honest default for narration, read-aloud, voice-over.

voiceEstimatedRead →

SUPERTONELightweight (on-device)

Supertonic 3

A lightweight on-device TTS that runs entirely locally through ONNX Runtime — no cloud call for synthesis. Supertonic 3 expands the open-weight release from 5 to 31 languages, improves reading stability, and cuts repeat/skip failures. `pip install supertonic` and synthesize immediately with selectable voice styles.

Reference entryEditorialRead →

OPENMOSS / MOSI.AI100M (0.1B)

MOSS-TTS-Nano (100M)

A 100M streaming-TTS that closes the multilingual gap Kokoro doesn't cover — 20 languages including English, Japanese, Korean, Spanish, French, Arabic, Mandarin, plus voice cloning from a short audio reference. 48 kHz stereo output, neural-audio-tokenizer + autoregressive LLM pipeline, runs real-time on 4 CPU cores. The ONNX build drops PyTorch entirely and gets ~2× the inference efficiency of the original.

Reference entryEstimatedRead →

RESEMBLE AITurbo: 350M · Multilingual: 0.5B Llama backbone

Chatterbox (Turbo + Multilingual v3)

SOTA open-source voice cloning — 5-second reference audio, paralinguistic tags `[laugh]` `[sigh]` `[cough]` native in Turbo, <150 ms latency, ethical PerTh watermarking on by default. The June 10, 2026 Multilingual v3 release keeps the same 0.5B Llama backbone and MIT license while extending coverage to 25 total languages (incl. 4 dialects + 6 tuned Language Packs).

voiceEstimatedRead →

OPENBMB2B (diffusion-autoregressive, tokenizer-free; MiniCPM-4 backbone)

VoxCPM2 (2B)

Apache 2.0 TTS with 48 kHz output, short-clip zero-shot voice cloning, and natural-language "voice design" (describe a voice, get one — no reference audio required) across 30 languages.

voiceEstimatedRead →

STEPFUN8B (LALM)

Step-Audio 2 mini

StepFun's 8B speech-to-speech LALM trained on 8M+ hours of audio. Competitive with GPT-4o-audio on speech recognition + S2S translation benchmarks, fully open-source weights.

voiceEstimatedRead →

CANOPY LABS3B (Llama-backbone)

Orpheus-TTS 3B

Llama-backbone TTS tuned for naturalness and emotion. Multilingual FTs (Spanish / Italian / French / Hindi) released as research artifacts.

voiceEstimatedRead →

Parakeet-TDT 0.6B v3

NVIDIA's high-throughput multilingual ASR — 25 European languages with auto language detection, handles 24-minute audio at full attention (3 h with local attention). Built for production batch transcription.

voiceEstimatedRead →

NVIDIA NEMO + M-BAIN (PIPELINE)2.5B (Canary) + 1.5B (WhisperX / Whisper-large-v3)

Canary-Qwen 2.5B + WhisperX

Canary-Qwen is an English-only ASR that doubles as a 2.5B LLM over its own transcripts — transcribe, then summarize/Q&A. WhisperX adds word-level timestamps + diarization. The near-frontier English-first pipeline.

voiceEstimatedRead →

M-BAIN (WHISPERX) · PYANNOTE (PIPELINE)~1.5B (Whisper large-v3)

WhisperX + pyannote 3.1

The canonical open-source pipeline for diarized transcription — wraps faster-whisper for ASR, wav2vec2 for alignment, pyannote for speaker segmentation.

voiceEstimatedRead →

SYSTRAN~809M (Whisper large-v3-turbo distilled)

faster-whisper large-v3-turbo

CTranslate2 reimplementation of OpenAI Whisper — 4× faster with int8 quantization, matches reference accuracy. The practical STT default.

voiceEstimatedRead →

QWEN (ALIBABA)1.7B and 0.6B (built on the Qwen3-Omni audio stack)

Qwen3-ASR (1.7B / 0.6B)

Qwen’s first dedicated open-weight ASR family — language identification plus speech recognition across 52 languages and dialects (30 languages + 22 Chinese dialects), built on the Qwen3-Omni audio foundation. Qwen claims the 1.7B is state-of-the-art among open-source ASR and competitive with the strongest proprietary commercial APIs. Apache 2.0, transformers-native, and small enough to run on CPU or any consumer GPU.

Reference entryEstimatedRead →

OPENBMB~9B (built on Qwen3-8B + vision/audio encoders)

MiniCPM-o 4.5

OpenBMB's current omnimodal flagship, superseding MiniCPM-o 2.6 — vision, speech-in, speech-out in one ~9B model now built on the Qwen3-8B backbone. Adds full-duplex live streaming (input and output don't block each other) and proactive interaction, and OpenBMB reports it matching Gemini 2.5 Flash on vision/speech. Apache 2.0.

voiceEstimatedRead →

OPENMOSS / MOSI.AI~9B total (8B LLM + audio encoder)

MOSS-Music-8B (Instruct + Thinking)

The first open-weight music-understanding LLM worth flagging — does lyrics ASR with time-aligned transcription, musical captioning, key/tempo/chord reasoning, structural analysis (intro/verse/chorus/bridge/outro), instrument + voice recognition, and music QA. Audio encoder runs at 12.5 Hz temporal resolution. 80.38% avg accuracy across 8 music-QA benchmarks; 15.88% avg WER/CER on lyrics; 4.36/5.0 MusicCaps captioning. Thinking variant adds chain-of-thought reasoning over audio.

Reference entryEstimatedRead →

Vision-language and video understanding

BAIDUCompact OCR vision-language model

Baidu Unlimited-OCR

Baidu's open-weight OCR / document-understanding VLM, positioned as a step beyond DeepSeek-OCR — high-fidelity text + layout extraction from images and documents. MIT-licensed, with vLLM support (community-added), an arXiv paper (2606.23050), and a live Hugging Face Space demo. Strong early traction (400K+ downloads within days).

Reference entryEditorialRead →

OPENBMB1B (SigLIP2-400M vision encoder + Qwen3.5-0.8B LLM)

MiniCPM-V-4.6 (1B vision-language)

Tiny vision-language model — single-image, multi-image, and video understanding from a 1B-class checkpoint. Mixed 4×/16× visual token compression cuts visual-encoding FLOPs >50% vs prior MiniCPM-V revs. Tool / function-calling built in. Artificial Analysis Intelligence Index 13 beats raw Qwen3.5-0.8B (10) at ~19× lower token cost. The newest entry in the V (vision-only) branch — parallel to the omnimodal MiniCPM-o line.

Reference entryEstimatedRead →

OPENMOSS / MOSI.AI11B

MOSS-VL-0408 (Base + Instruct)

OpenMOSS's vision-language entry, sized for serious video understanding — up to 256 video frames per inference, ~201M pixel budget per image or per video, 16×16 patch size, interleaved image+text+video sequences. SFT-tuned from `MOSS-VL-Base-0408`. Native BF16. Distinct from the Qwen3-VL line and parallel to MiniCPM-V — pick this when you need long-video context, not single-frame OCR.

Reference entryEstimatedRead →

Embeddings — retrieval + RAG

ALIBABA (QWEN)0.6B / 4B / 8B

Qwen3-Embedding (0.6B / 4B / 8B)

Qwen's embedding family — the 8B ranks #1 overall on MTEB Multilingual as of 2026, making the line the current best-quality open retrieval pick and displacing BGE-M3 at the top. Apache 2.0, three sizes so you can trade quality for footprint, with a matching Qwen3-Reranker family for two-stage retrieval.

docsEstimatedRead →

BAAI568M (XLM-RoBERTa-large base)

BGE-M3

BAAI's multi-functionality + multilingual (170+ languages) + multi-granularity embedding. The default "just use it" RAG embedding since early 2024. As of 2026 it is no longer the top-quality pick — `Qwen3-Embedding` (0.6B / 4B / 8B, Apache 2.0) now leads MTEB overall — but BGE-M3 remains the sharpest pick for cheap, broad multilingual breadth at 568M.

Reference entryEstimatedRead →

NOMIC~137M (Matryoshka)

nomic-embed-text-v1.5

Nomic's Matryoshka Representation Learning embedding — truncate to 64/128/256/512/768 dims at query time for a tiny quality drop, giving drop-in cost/speed control BGE-M3 doesn't offer.

docsEstimatedRead →

Popular model checks

Start with the practical model names people actually need to run: Qwen3-Coder-30B-A3B for local coding, Microsoft Lens-Turbo for MIT-licensed image generation, and Qwen 3.5 35B-A3B for the 24 GB MoE tier.

Reverse lookup

Know the model, want to see which hardware runs it? Use Find-by-Model. Enter any pick and get the hardware options that naturally fit.