reddit.com via Reddit June 6th 2026

r/singularity: Claude Opus 4.8 Thinking Scores 23 Points Below Opus 4.6 on LMArena Hard Prompts English — Successive Generations Declining

anthropic benchmarks claude llm-evaluation

Summary

A r/singularity post tracking LMArena's Hard Prompts (English) leaderboard finds Claude Opus 4.6 Thinking holding #1 while Opus 4.7 Thinking trails by 15 points and Opus 4.8 Thinking trails by 23 points — each successive Anthropic model performing worse on this benchmark despite coding gains. The poster notes this is a non-coding user's perspective on the Hard Prompts (English) category specifically. The pattern adds to earlier community data from EyeBench-V3 placing Opus 4.8 at the bottom of frontier models on visual perception.

Originally reported by reddit.com

Read the original article →

Original headline: r/singularity: Claude Opus 4.8 Thinking Scores 23 Points Below Opus 4.6 on LMArena Hard Prompts English — Successive Generations Declining