Z.ai Releases GLM-5.2: 753B Open Model With 1M-Token Context
TL;DR
- GLM-5.2 is a 753B parameter model released under an MIT license with no regional restrictions and a 1M-token context window.
- Its IndexShare architecture reduces per-token FLOPs by 2.9x at 1M context length by reusing indexers across sparse attention layers.
- Reported benchmarks include 99.2 on AIME 2026, 62.1 on SWE-bench Pro, and 54.7 on HLE with Tools.
Open-weight models at frontier scale keep getting more interesting, and Z.ai's release of GLM-5.2 on Hugging Face is a case worth examining carefully. The model weighs in at 753 billion parameters, ships under an MIT license with no stated regional restrictions, and offers a 1M-token context window that the team describes as stable for long-horizon work.
The architecture story is the part that deserves attention. Z.ai's IndexShare design reuses indexers across every four sparse attention layers, which the team claims reduces per-token FLOPs by 2.9x at 1M context length. That is not a trivial efficiency gain if it holds under production conditions, because serving a 1M-context model at reasonable cost is where most comparable efforts have stumbled. The release also notes a 20% increase in speculative decoding acceptance length via an improved MTP layer, pointing at inference speed as a deliberate design target alongside capability.
The benchmark numbers are aggressive. According to the Hugging Face model page, GLM-5.2 scores 99.2 on AIME 2026, 91.2 on GPQA-Diamond, and 62.1 on SWE-bench Pro. On agentic tasks, MCP-Atlas comes in at 76.8. The model is reportedly benchmarked against closed frontier models including GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro, DeepSeek-V4-Pro, and others, where Z.ai claims GLM-5.2 ranks competitively or leads on several evaluations.
The honest caveat is that a benchmark win, especially at 99.2 on a math olympiad dataset, is not the same as reliable performance on your actual problem. And at 753B parameters, the 'open' in open-source carries a hardware asterisk: self-hosting this model requires infrastructure that is out of reach for most individual developers and small teams, regardless of what the license permits. The supported deployment stack is wide (SGLang, vLLM, Transformers, KTransformers, Unsloth, Ascend NPU), which helps, but the compute requirement is real.
Who benefits most in the near term is probably the set of organizations doing long-document analysis, agentic coding pipelines, or enterprise deployments in regions where closed API access is expensive or restricted. For them, a 1M-context model they can run on-premises with no per-token cost and no regional licensing friction is a genuinely different proposition from what existed before.
Shared on Bluesky by 3 AI experts
-
GLM-5.2 is now open-weight. Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-... API: docs.z.ai/guides/llm/g... Coding Plan: z.ai/subscribe Chat: chat.z.ai
View on Bluesky →
Originally reported by huggingface.co
Read the original article →Original headline: zai-org/GLM-5.2 · Hugging Face