Granite-Switch 4.1 8B Preview (12 task LoRAs)

IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.

License: Apache 2.0 · Context: 131,072 (128K) · Released: May 25, 2026 (HF upload)

The decision in five lines

The call: Consider — runnable locally, family reference
Best for: Local evaluation and family reference
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: General chat or coding — the base Granite 4.1 8B is a respectable but not standout 8B; the value here is the embedded adapter toolkit, not the base.
Evidence: Estimated · last verified July 2026

8B base + 12 embedded LoRA adapters (~10B total): PARAMETERS
DENSE LLM WITH CONTROL-TOKEN-ACTIVATED TASK ADAPTERS: TYPE
131072: CONTEXT
~5–6 GB (Q4 base) / ~16 GB (FP16) — adapter overhead is small: VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.
When not to use: General chat or coding — the base Granite 4.1 8B is a respectable but not standout 8B; the value here is the embedded adapter toolkit, not the base. Also **PREVIEW** status: IBM explicitly notes adapters should be tested per-use-case before production, and the Guardian-Library safety adapters are not a substitute for application-level safety testing.

Runner notes

Three sizes ship together — 3B, 8B, 30B (all preview). Adapters activated via control tokens in the chat template (`granitelib-core-r1.0`, `granitelib-rag-r1.0`, `granitelib-guardian-r1.0`). Standard `LlamaForCausalLM`-class architecture — vLLM and transformers work natively, no exotic kernels. Distinct from the base Granite 4.1 detail page (`/models/granite-4-1/`) — Switch is the adapter-embedded variant, not a replacement.

License: Apache 2.0
Released: May 25, 2026 (HF upload)
Maker: IBM Granite
Model card: huggingface.co/ibm-granite/granite-switch-4.1-8b-preview →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→