GLM-5.1

Long-horizon agentic coding flagship from the GLM-5 line — tops SWE-Bench Pro at 58.4, narrowly beating GPT-5.4 and Claude Opus 4.6 on that benchmark, under a clean MIT license. Superseded June 16, 2026 by GLM-5.2 (62.1 SWE-Bench Pro, a solid 1M context, same MIT license) — see the GLM-5.2 fast take; both stay hosted / big-iron, so neither is a local pick.

License: MIT · Context: 200K tokens (131K max output) · Released: April 7, 2026

The decision in five lines

The call: Hosted only
Best for: agents
Runs on: Hosted or workstation-class only · ~466 GB (not consumer-local)
Watch out: At Q4_K_XL it's 466 GB on disk and needs a data-center GPU partition or a Mac Studio 512 GB.
Evidence: Estimated · last verified July 2026

744B total: PARAMETERS
MOE: TYPE
200K: CONTEXT
~466 GB (not consumer-local): VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

AGENTS ·

GLM-5.1 (frontier, hosted)Long-horizon agentic specialist — stays productive across hundreds of rounds and thousands of tool calls. 744B MoE, MIT. Hosted via Z.ai or `glm-5.1:cloud` Ollama tag.

AGENTS · TOP

GLM-5.1 (frontier, hosted)744B MoE — SOTA on SWE-Bench Pro (58.4), 8-hour autonomous runs, MIT. Practical only via Z.ai hosted API or `glm-5.1:cloud` Ollama tag.

The call

Long-horizon agentic coding flagship from the GLM-5 line — tops SWE-Bench Pro at 58.4, narrowly beating GPT-5.4 and Claude Opus 4.6 on that benchmark, under a clean MIT license. Superseded June 16, 2026 by GLM-5.2 (62.1 SWE-Bench Pro, a solid 1M context, same MIT license) — see the GLM-5.2 fast take; both stay hosted / big-iron, so neither is a local pick.
When not to use: Any single-card local setup. At Q4_K_XL it's 466 GB on disk and needs a data-center GPU partition or a Mac Studio 512 GB. For most users, the honest path is `glm-5.1:cloud` (or `glm-5.2:cloud`) via Ollama rather than pretending it runs locally.

Runner notes

Unsloth dynamic 2-bit (220 GB) or 1-bit (200 GB) GGUFs make workstation-class local runs *possible* but slow. Ollama has `glm-5.1` + `glm-5.1:cloud` tags (and now `glm-5.2:cloud`) — `:cloud` is the honest choice for consumer hardware.

License: MIT
Released: April 7, 2026
Maker: Z.ai (formerly Zhipu AI)
Model card: huggingface.co/zai-org/GLM-5.1 →

Next step

Find-by-model — see what hardware runs this→