the AI bench
VERIFIED MAY 2026
All models

MODEL · IBM GRANITE · 8B BASE + 12 EMBEDDED LORA ADAPTERS (~10B TOTAL)

Granite-Switch 4.1 8B Preview (12 task LoRAs)

IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.

License: Apache 2.0 · Context: 131,072 (128K) · Released: May 25, 2026 (HF upload)

The decision in five lines

The call
Consider — runnable locally, family reference
Best for
Local evaluation and family reference
Runs on
23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out
General chat or coding — the base Granite 4.1 8B is a respectable but not standout 8B; the value here is the embedded adapter toolkit, not the base.
Evidence
Estimated · last verified May 2026

8B base + 12 embedded LoRA adapters (~10B total)
PARAMETERS
DENSE LLM WITH CONTROL-TOKEN-ACTIVATED TASK ADAPTERS
TYPE
131072
CONTEXT
~5–6 GB (Q4 base) / ~16 GB (FP16) — adapter overhead is small
VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

IBM Granite 4.1 8B with 12 task-specialized LoRA adapters embedded in a single checkpoint, activated per-token via control tokens in the chat template. Three libraries: **Core** (3 adapters — requirement check, context attribution, uncertainty), **RAG** (5 — query rewrite, query clarification, answerability, hallucination detection, citation generation), **Guardian** (4 — safety detection, factuality detection + correction, policy guardrails). A lightweight switch layer detects control tokens and produces per-position adapter indices applied across all decoder layers; KV-cache normalization keeps adapters independent. Novel deployment pattern for production RAG / agent stacks — one checkpoint, multiple specialized behaviors. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH.

When not to use: General chat or coding — the base Granite 4.1 8B is a respectable but not standout 8B; the value here is the embedded adapter toolkit, not the base. Also **PREVIEW** status: IBM explicitly notes adapters should be tested per-use-case before production, and the Guardian-Library safety adapters are not a substitute for application-level safety testing.

Runner notes

Three sizes ship together — 3B, 8B, 30B (all preview). Adapters activated via control tokens in the chat template (`granitelib-core-r1.0`, `granitelib-rag-r1.0`, `granitelib-guardian-r1.0`). Standard `LlamaForCausalLM`-class architecture — vLLM and transformers work natively, no exotic kernels. Distinct from the base Granite 4.1 detail page (`/models/granite-4-1/`) — Switch is the adapter-embedded variant, not a replacement.

License
Apache 2.0
Released
May 25, 2026 (HF upload)
Maker
IBM Granite

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this