IBM Granite 4.1 — Apache 2.0 dense at 3B/8B/30B with 512K context

IBM dropped the Granite 4.1 family on April 29 — three dense sizes (3B / 8B / 30B), Apache 2.0, 128K default context extending to 512K via training-stage continuation, native Ollama tags live the same day. The headline claim from IBM: the new 8B instruct matches their prior Granite 4.0 32B-A9B MoE. Believable but unverified by community benchmarks at time of writing.

Verdict: Apache 2.0 dense + 512K context + Ollama-native; vendor 8B-vs-32B claim wants community validation

The take

IBM Research shipped Granite 4.1 with the open-weights drop on HuggingFace and same-day Ollama tags (`granite4.1:3b` / `granite4.1:8b` / `granite4.1:30b`). All Apache 2.0, all dense decoder-only architecture, all instruct + base variants. Q4_K_M default tag sizes: 2.1 GB / 5.3 GB / 17 GB on disk. Trained on ~15T tokens with progressive annealing toward technical/scientific/mathematical data plus instruction-following.

The 512K context number wants a footnote: the default Ollama tags ship at 128K, and the 512K ceiling comes from a late-training context-extension stage IBM ran. Useful when you genuinely need ultra-long context, but the practical default is 128K — which is still ample for most workloads and competitive with Qwen 3.5/3.6's native 262K.

Where it fits the planner's tier map without rotating picks yet: 3B competes in the low band with Qwen 3.5 4B / 2B / Phi-4 Mini; 8B competes in mid with Qwen 3.5 9B / Qwen3-14B / gpt-oss-20b; 30B at Q4 (~17 GB) competes in high/top with Qwen 3.5 27B / Qwen 3.6-27B / Gemma 4 31B. IBM's own benchmarks position 8B against Granite 4.0 32B-A9B (their previous flagship MoE), not against Qwen or Gemma — so the vendor-relative win is real, the cross-vendor comparison is open.

Honest read: this is a comfortable Apache 2.0 enterprise drop with frictionless local install. The pick rotation question — does it displace Qwen 3.5 9B at chat.mid, or Qwen 3.6-27B at chat.top — needs community signal that doesn't exist yet on day 5. Hold the rotation until r/LocalLLaMA and Arena have a few weeks of data; revisit at the Q3 quarterly refresh on 2026-07-17 or sooner if a clear win surfaces. In the meantime: model entry at /models/granite-4-1/, install via `ollama pull granite4.1:8b` if you want to test it yourself.

Where this fits

Models: IBM Granite 4.1 · Qwen 3.5 9B · Qwen 3.6-27B · gpt-oss-20b

Hardware: RTX 5060 Ti 16 GB · NVIDIA RTX 3060 12 GB · Mac Mini M4 16 GB · NVIDIA RTX 5090

Sources

Next step

Try this in the planner→