Z AI's GLM-5.2 tops open-weights Intelligence Index at 51
TL;DR
- GLM-5.2 scores 51 on the Artificial Analysis Intelligence Index, the leading open-weights result, ahead of MiniMax-M3, DeepSeek V4 Pro and Kimi K2.6.
- Fifteen providers serve the model with roughly a 2.4x price spread and 11.4x output-speed spread; Blackbox AI leads at 457.4 tokens per second.
- GLM-5.2 uses about 43k output tokens per Intelligence Index task, pushing cost per task to roughly $0.46 versus $0.18 for MiniMax-M3.
Z AI's GLM-5.2 has taken the open-weights lead on the Artificial Analysis Intelligence Index, scoring 51 and sitting ahead of MiniMax-M3 at 44, DeepSeek V4 Pro (max) at 44 and Kimi K2.6 at 43. It is the same 744B total, 40B active-parameter mixture-of-experts shape as GLM-5.1, MIT licensed, with a 1M-token context. Overall it lands fourth on the index behind Fable 5, Opus 4.8 and GPT-5.5 (xhigh).
The providers view on Artificial Analysis is where this gets practically interesting. Fifteen hosts now serve GLM-5.2 (max), and Artificial Analysis reports roughly a 2.4x spread on blended per-million-token pricing and an 11.4x spread on output speed. Blackbox AI is the fastest listed at 457.4 tokens per second with a 6.54 second time-to-first-token, while GMI's FP8 deployment is the cheapest at $0.72 blended per million tokens. On the first-party API, Z AI has priced GLM-5.2 in line with 5.1 at roughly $1.4 input and $4.4 output per million tokens.
The honest caveat is on token efficiency, and it is a real one. Artificial Analysis reports GLM-5.2 uses 43k output tokens per Intelligence Index task, up from GLM-5.1's 26k, and higher than open-weights peers MiniMax-M3 at 24k and Kimi K2.6 at 35k. That pushes cost per task to about $0.46 for GLM-5.2 versus $0.25 for GLM-5.1, $0.18 for MiniMax-M3 and $0.31 for Kimi K2.6. A cheaper provider at $0.72 per million still bills you for every one of those 43k tokens, so the ranking table is not a substitute for doing the workload math.
What the reporting does not give you is a clean read on tool-use reliability across the 15 hosts, whether the FP4 and FP8 quantised deployments hold the 51 index score at production load, or how the 1M context behaves on messy long coding-agent trajectories rather than benchmark runs. For teams shopping an open-weights flagship that clears the frontier bar on scientific reasoning and on standard coding benchmarks, though, GLM-5.2 is now the default one to trial, provided the cost-per-task math is done seriously rather than the sticker-price math.
Shared on Bluesky by 1 AI expert
Originally reported by artificialanalysis.ai
Read the original article →Original headline: GLM-5.2 (max): API Provider Performance Benchmarking & Price Analysis | Artificial Analysis