r/singularity: Claude Opus 4.8 Thinking Scores 23 Points Below Opus 4.6 on LMArena Hard Prompts English — Successive Generations Declining
Summary
A r/singularity post tracking LMArena's Hard Prompts (English) leaderboard finds Claude Opus 4.6 Thinking holding #1 while Opus 4.7 Thinking trails by 15 points and Opus 4.8 Thinking trails by 23 points — each successive Anthropic model performing worse on this benchmark despite coding gains. The poster notes this is a non-coding user's perspective on the Hard Prompts (English) category specifically. The pattern adds to earlier community data from EyeBench-V3 placing Opus 4.8 at the bottom of frontier models on visual perception.
Originally reported by reddit.com
Read the original article →Original headline: r/singularity: Claude Opus 4.8 Thinking Scores 23 Points Below Opus 4.6 on LMArena Hard Prompts English — Successive Generations Declining