MODELS · 56 CURATED OPEN-WEIGHT PICKS
Every model we recommend.
Dated, opinionated, license-audited. Machine-readable by design.
Canonical detail page per base model. Which tier slots it fills, license gotchas, runner friction, and which hardware actually runs it — pulled live from the planner so this index stays current when picks shift.
Filter by use case
Alibaba Qwen — the most-used open family in 2026
Qwen 3.6-35B-A3B
Alibaba's post-3.5 refresh specifically targeting agentic coding — claims to beat dense Qwen 3.5 27B and Gemma 4 31B on coding + reasoning at the same active-param budget. Fully multimodal.
Qwen 3.6-27B
The April 2026 dense refresh that supersedes Qwen 3.5 27B — claims to beat the prior 397B MoE flagship on coding benchmarks while staying single-GPU deployable at Q4. The current dense top pick for 24 GB rigs and Mac 32+ GB.
Qwen 3.5 35B-A3B
The 24 GB-VRAM unlock — dense-27B quality at 3B-active speed, and the community workhorse for mixed coding / chat / docs where breadth matters. 256 experts with 8 routed + 1 shared per token.
Qwen 3.5 27B
A dense, natively multimodal (text + image + video input) mid-large generalist — the biggest non-MoE in the Qwen 3.5 medium line. Best realistic pick for long-context docs on 24 GB VRAM or Mac 48 GB+.
Qwen 3.5 9B
Strongest "runs on a mid-tier GPU" model in the Qwen 3.5 small line — supports thinking mode and 201-language coverage. Fits 8 GB VRAM with headroom.
Qwen 3.5 4B
The low-tier sweet spot — fits 6 GB VRAM at Q4, strong multimodal and agentic tool-use for its size, supports both thinking and non-thinking modes.
Qwen 3.5 2B
Phone-class multimodal model built on the same Qwen 3.5 foundation as the medium tier. Non-thinking by default. Runs anywhere, including CPU-only setups.
Qwen3-Coder-30B-A3B
The community daily-driver coding MoE for 24 GB-class hardware — purpose-trained for agentic coding + browser-use. Delivers 30B-dense quality at 3B-dense throughput.
Qwen3-Coder-480B-A35B
Frontier open-weight agentic coding model — claimed on par with Claude Sonnet on agentic benchmarks. Alibaba's most powerful coder.
Qwen3-Omni-30B-A3B-Instruct
The only locally-runnable open-weight model that does real-time streaming speech-out natively. 119 input languages, 10 speech-output languages (two voices: Chelsie, Ethan).
Qwen3-14B
Last-generation Qwen3 14B dense with thinking mode enabled by default and strong tool-calling. Still a solid 16 GB-VRAM pick when you want dense behaviour over MoE. Qwen 3.5 skipped the 14B slot.
Qwen-Image-2512 (20B) + Edit-2511
Strongest open-weight image model for text rendering — Arabic, Chinese, English all sharp. Qwen-Image-2512 (Dec 31 2025) is the latest released generation model and claims #1 open-source on AI Arena; pair with Qwen-Image-Edit-2511 (Dec 23 2025) for editing workflows. A unified 7B "Qwen-Image 2.0" was announced Feb 10 2026 (arxiv 2605.10730 tech report) but the weights are not yet open-sourced — Qwen-Image-2512 remains the runnable flagship as of June 2026.
Other frontier and mid-tier text
Command A+ (218B-A25B)
Cohere's frontier-class MoE: 218B params with 25B active per token, hybrid sliding-window + global attention, native vision + 48-language coverage. The first Apache-2.0 frontier MoE you can actually serve on 2× H100 — same hardware class as DeepSeek V4-Pro and Kimi K2.6 but with a permissive license neither of those carries.
GLM-5.1
Current #1 open-weight on SWE-Bench Pro (58.4) — a long-horizon agentic coding flagship that narrowly beats GPT-5.4 and Claude Opus 4.6 on that benchmark. MIT license means no commercial restrictions, unlike many frontier opens.
Kimi K2.6
Moonshot's 1T MoE agentic coder — keeps the K2 architecture (61 layers, 64 attention heads, MLA) and extends context to 256K. Tops SWE-Bench Pro at 58.6 (vs GPT-5.4 xhigh 57.7, Opus 4.6 max 53.4) and lands #4 on the Artificial Analysis Intelligence Index. The real differentiator is Agent Swarm — 300 sub-agents over 4,000 coordinated steps, a different shape of capability than single-step quality.
DeepSeek V4-Pro
DeepSeek's frontier-class V4 flagship — 1.6T MoE that matches GPT-5.4 and Sonnet 4.6 on most benchmarks at meaningfully lower hosted price. The 1M-context default uses ~27% of V3.2's single-token FLOPs and ~10% of its KV cache thanks to architecture changes. MIT-licensed, but not realistically a local pick at this size.
DeepSeek V4-Flash
The smaller half of the V4 family — 284B MoE with 13B active per token. Same 1M context, same MIT license, same architectural KV-cache improvements as V4-Pro. The honest local pick of the V4 line: still frontier-class on most benchmarks, but realistically deployable only on M3 Ultra 192GB unified or dual 80GB server cards.
Gemma 4 (31B dense + 26B A4B MoE)
Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal. Big news: Gemma 4 moved to Apache 2.0 from the custom Gemma Terms. The current Apache-2.0 "best dense under 70B" pick.
Gemma 3 (4B / 12B / 27B)
Previous-generation Gemma line. The 4B is still a useful ultra-compact agent model with native vision. Larger sizes are superseded by Gemma 4.
Mistral Medium 3.5 128B
Mistral's flagship 128B dense model, replacing Medium 3.1 + retiring the dedicated Magistral (reasoning) and Devstral 2 (coding) specialist models into one weight set with a per-request `reasoning_effort` toggle. 77.6% on SWE-Bench Verified, native multimodal vision encoder trained from scratch, 256K context. The first serious Mistral release since Ministral 3 (Dec 2025).
Ministral 3 family (3B / 8B / 14B)
Mistral's clean Apache-2.0 edge family with Base / Instruct / Reasoning splits per size. The "no-license-drama" alternative to Qwen or Gemma when lawyers are involved.
gpt-oss-20b
OpenAI's open-weights MoE. Matches o3-mini on common benchmarks, post-trained with MXFP4 quantization so it lands in 16 GB VRAM — a near-frontier reasoner you can actually run on a 5060 Ti.
IBM Granite 4.1
IBM's refreshed open-weights enterprise family — three dense decoder-only sizes, Apache 2.0, trained on ~15T tokens with progressive annealing toward technical/scientific/mathematical data plus instruction-following. The 8B instruct claims to match the prior Granite 4.0 32B-A9B MoE flagship on IBM's own benchmarks; cross-vendor comparison (vs Qwen/Gemma/Mistral) is unverified at time of publication.
Granite-Switch 4.1 8B Preview (12 task LoRAs)
IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.
Phi-4 Mini
Microsoft's dense 3.8B instruct model built on synthetic + filtered web data. Punches above its weight on reasoning-heavy prompts in the <5B bracket. MIT license is unusually clean for commercial redistribution.
MiniMax M2.5 / M2.7
MiniMax's open-weights agentic-workflow family — strong on coding + tool-use Arena. M2.7 is the first model that "participates in its own evolution" via self-iterated RL. Frontier-class, not a local workhorse.
BitCPM4-CANN family (0.5B / 1B / 3B / 8B, native 1.58-bit)
First publicly reported end-to-end 1.58-bit (ternary {-1, 0, 1}) training stack at 8B scale. Trained natively at 1.58-bit via Quantization-Aware Training + Straight-Through Estimator on Huawei Ascend NPU — not a post-hoc PTQ pass over a BF16 model. The 8B model retains 95.7% of full-precision MiniCPM4 performance at ~6× memory reduction; 0.5B retains 90.1%. The new low-VRAM tier ceiling.
MiniCPM5-1B (Apache 2.0, OPD-trained)
OpenBMB's claimed 1B-class open-source SOTA — but the training-method story matters more than the size. The post-training pipeline runs SFT → RL → On-Policy Distillation (OPD): RL teachers are trained per domain (math, code, closed-book QA, writing) and then distilled back into one release model. RL + OPD lifts the SFT-only checkpoint by +16pt average on math / code / instruction-following and drops max-token-truncated responses by 29 percentage points. Hybrid `<think>` reasoning toggle (switch via `enable_thinking`) and native XML-style tool calling. English + Chinese.
Image generation
FLUX.2 [dev]
BFL's frontier open-weights T2I — best-in-class prompt adherence and text rendering for any license-flexible open model in April 2026.
FLUX.2 [klein] (4B + 9B)
FLUX distilled for fast inference. The 4B variant is Apache 2.0 — the first FLUX-quality image model you can actually ship in a commercial product.
HiDream-O1-Image (8B)
HiDream's next-generation image foundation model. The architectural story: pixel-space generation without an external VAE or disjoint text encoder — one Pixel-level Unified Transformer handles text-to-image, image editing, subject-driven personalization, and storyboarding in a single weight set. Debuted at #8 on Artificial Analysis T2I Arena at launch. Supersedes HiDream-I1 for the MIT-license open-weight slot.
Microsoft Lens (3.8B, MIT)
Microsoft's first foundational text-to-image model. Three-step ladder: `Lens-Base` (50-step supervised), `Lens` (20-step RL-tuned default), `Lens-Turbo` (4-step distilled). Architecture is novel: an MMDiT trunk paired with FLUX.2's semantic VAE and multi-layer features from a frozen GPT-OSS text model — Microsoft's public framing claims competitive quality at "substantially less training compute than larger T2I models." Cleanest MIT-licensed T2I at this param count.
HiDream-I1 (Full / Dev / Fast)
17B MMDiT open-weights image foundation, MIT-licensed, SOTA-at-release. Full/Dev/Fast ladder distilled down the step count. The MIT license is a big unlock for production commercial pipelines vs FLUX dev.
Z-Image-Turbo
Alibaba Tongyi's distilled 6B T2I that matches FLUX.2-dev quality in 8 steps. Bilingual English + Chinese text rendering. The community daily driver for Apache-2.0 image gen in 2026.
SANA (0.6B / 1.6B)
NVLabs + MIT Han Lab's linear-attention diffusion transformer. Fastest image generation at any given quality tier in its class — 23–39× faster than FLUX-dev on the same hardware.
Stable Diffusion 3.5 Medium
Stability's consumer-friendly MMDiT-X text-to-image. Designed to run on consumer GPUs, mature ecosystem with thousands of LoRAs and ControlNets. The community "safe SD default."
Voice — TTS, STT, and multimodal
Kokoro-82M
An 82M TTS model that punches absurdly above its weight — ranked #1 on TTS Arena against 7B+ models, runs fine on CPU. The honest default for narration, read-aloud, voice-over.
MOSS-TTS-Nano (100M)
A 100M streaming-TTS that closes the multilingual gap Kokoro doesn't cover — 20 languages including English, Japanese, Korean, Spanish, French, Arabic, Mandarin, plus voice cloning from a short audio reference. 48 kHz stereo output, neural-audio-tokenizer + autoregressive LLM pipeline, runs real-time on 4 CPU cores. The ONNX build drops PyTorch entirely and gets ~2× the inference efficiency of the original.
Chatterbox (Turbo + Multilingual)
SOTA open-source voice cloning — 5-second reference audio, paralinguistic tags `[laugh]` `[sigh]` `[cough]` native in Turbo, <150 ms latency, ethical PerTh watermarking baked in.
VoxCPM2 (2B)
Apache 2.0 TTS with 48 kHz output, short-clip zero-shot voice cloning, and natural-language "voice design" (describe a voice, get one — no reference audio required) across 30 languages.
Step-Audio 2 mini
StepFun's 8B speech-to-speech LALM trained on 8M+ hours of audio. Competitive with GPT-4o-audio on speech recognition + S2S translation benchmarks, fully open-source weights.
Orpheus-TTS 3B
Llama-backbone TTS tuned for naturalness and emotion. Multilingual FTs (Spanish / Italian / French / Hindi) released as research artifacts.
Parakeet-TDT 0.6B v3
NVIDIA's high-throughput multilingual ASR — 25 European languages with auto language detection, handles 24-minute audio at full attention (3 h with local attention). Built for production batch transcription.
Canary-Qwen 2.5B + WhisperX
Canary-Qwen is an English-only ASR that doubles as a 2.5B LLM over its own transcripts — transcribe, then summarize/Q&A. WhisperX adds word-level timestamps + diarization. The near-frontier English-first pipeline.
WhisperX + pyannote 3.1
The canonical open-source pipeline for diarized transcription — wraps faster-whisper for ASR, wav2vec2 for alignment, pyannote for speaker segmentation.
faster-whisper large-v3-turbo
CTranslate2 reimplementation of OpenAI Whisper — 4× faster with int8 quantization, matches reference accuracy. The practical STT default.
MiniCPM-o 2.6
GPT-4o-class omnimodal 8B — vision, speech input, speech output, voice cloning in one model. End-to-end with full-duplex live streaming.
MOSS-Music-8B (Instruct + Thinking)
The first open-weight music-understanding LLM worth flagging — does lyrics ASR with time-aligned transcription, musical captioning, key/tempo/chord reasoning, structural analysis (intro/verse/chorus/bridge/outro), instrument + voice recognition, and music QA. Audio encoder runs at 12.5 Hz temporal resolution. 80.38% avg accuracy across 8 music-QA benchmarks; 15.88% avg WER/CER on lyrics; 4.36/5.0 MusicCaps captioning. Thinking variant adds chain-of-thought reasoning over audio.
Vision-language and video understanding
MiniCPM-V-4.6 (1B vision-language)
Tiny vision-language model — single-image, multi-image, and video understanding from a 1B-class checkpoint. Mixed 4×/16× visual token compression cuts visual-encoding FLOPs >50% vs prior MiniCPM-V revs. Tool / function-calling built in. Artificial Analysis Intelligence Index 13 beats raw Qwen3.5-0.8B (10) at ~19× lower token cost. The newest entry in the V (vision-only) branch — parallel to the omnimodal MiniCPM-o line.
MOSS-VL-0408 (Base + Instruct)
OpenMOSS's vision-language entry, sized for serious video understanding — up to 256 video frames per inference, ~201M pixel budget per image or per video, 16×16 patch size, interleaved image+text+video sequences. SFT-tuned from `MOSS-VL-Base-0408`. Native BF16. Distinct from the Qwen3-VL line and parallel to MiniCPM-V — pick this when you need long-video context, not single-frame OCR.
Embeddings — retrieval + RAG
BGE-M3
BAAI's multi-functionality + multilingual (170+ languages) + multi-granularity embedding. The default "just use it" RAG embedding since early 2024.
nomic-embed-text-v1.5
Nomic's Matryoshka Representation Learning embedding — truncate to 64/128/256/512/768 dims at query time for a tiny quality drop, giving drop-in cost/speed control BGE-M3 doesn't offer.
Popular model checks
Start with the practical model names people actually need to run: Qwen3-Coder-30B-A3B for local coding, Microsoft Lens-Turbo for MIT-licensed image generation, and Qwen 3.5 35B-A3B for the 24 GB MoE tier.
Reverse lookup
Know the model, want to see which hardware runs it? Use Find-by-Model. Enter any pick and get the hardware options that naturally fit.