MODEL · IBM GRANITE · 8B BASE + 12 EMBEDDED LORA ADAPTERS (~10B TOTAL)
Granite-Switch 4.1 8B Preview (12 task LoRAs)
IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.
License: Apache 2.0 · Context: 131,072 (128K) · Released: May 25, 2026 (HF upload)
The decision in five lines
- The call
- Consider — runnable locally, family reference
- Best for
- Local evaluation and family reference
- Runs on
- 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
- Watch out
- General chat or coding — the base Granite 4.1 8B is a respectable but not standout 8B; the value here is the embedded adapter toolkit, not the base.
- Evidence
- Estimated
- 8B base + 12 embedded LoRA adapters (~10B total)
- PARAMETERS
- DENSE LLM WITH CONTROL-TOKEN-ACTIVATED TASK ADAPTERS
- TYPE
- 131072
- CONTEXT
- ~5–6 GB (Q4 base) / ~16 GB (FP16) — adapter overhead is small
- VRAM AT Q4
Where we recommend this
This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.
The call
IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.
When not to use: General chat or coding — the base Granite 4.1 8B is a respectable but not standout 8B; the value here is the embedded adapter toolkit, not the base. Also **PREVIEW** status: IBM explicitly notes adapters should be tested per-use-case before production, and the Guardian-Library safety adapters are not a substitute for application-level safety testing.
Runner notes
Three sizes ship together — 3B, 8B, 30B (all preview). Adapters activated via control tokens in the chat template (`granitelib-core-r1.0`, `granitelib-rag-r1.0`, `granitelib-guardian-r1.0`). Standard `LlamaForCausalLM`-class architecture — vLLM and transformers work natively, no exotic kernels. Distinct from the base Granite 4.1 detail page (`/models/granite-4-1/`) — Switch is the adapter-embedded variant, not a replacement.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Intel Arc B580 12 GBPerfect · 1.8× 12 GB · $249–$299
- NVIDIA RTX 3060 12 GBPerfect · 1.8× 12 GB · $250–$340
- Minisforum UM890 ProPerfect · 3.6× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBPerfect · 2.4× 16 GB · $560–$610
- AMD Radeon RX 9070 XTPerfect · 2.4× 16 GB · $649–$779
- AMD Radeon RX 7900 XTXPerfect · 3.6× 24 GB · $760 used / ~$1,500 new
- Mac Mini M4 16 GBPerfect · 1.6× 16 GB unified · $799 (new floor) / $499–$599 (eBay/residuals)
- NVIDIA RTX 3090 (used, single)Perfect · 3.6× 24 GB · $950–$1,200
- NVIDIA RTX 5070 TiPerfect · 2.4× 16 GB · $980–$1,300
- NVIDIA RTX 5080Perfect · 2.4× 16 GB · $999–$1,400
- MacBook Air M5 24 GBPerfect · 2.4× 24 GB unified · $1,299–$1,699
- Mac Mini M4 Pro 24 GBPerfect · 2.4× 24 GB unified · $1,399
- Dual RTX 3090 (used)Perfect · 7.2× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 12.9× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 3.6× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 4.9× 48 GB unified · $2,599–$3,099
- NVIDIA RTX 5090Perfect · 4.8× 32 GB · $2,910–$4,300
- Mac Studio M4 Max 64 GBPerfect · 6.5× 64 GB unified · $3,199
- NVIDIA RTX A6000 (48 GB, used)Perfect · 7.2× 48 GB ECC · $3,500–$4,500
- Mac Studio M3 Ultra 96 GBPerfect · 9.7× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBPerfect · 6.5× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 12.9× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 9.7× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→