anthropic.com web signal

Anthropic Reports AI Now Authors Over 80% of Its Production Code

By Alexis Dufresne Published June 20, 2026 at 17:47 UTC Updated June 20, 2026 at 18:10 UTC

TL;DR

As of May 2026, over 80% of Anthropic's production code was authored by Claude, up from single digits before February 2025.
Claude Mythos Preview outperformed human researchers on next-step research judgment 64% of the time as of April 2026.
Anthropic engineers shipped 8x more code per quarter in Q2 2026 versus 2024, driven by Claude-assisted development.

The speed at which a company ships software is usually limited by the number of engineers. Anthropic's essay on recursive self-improvement suggests that relationship is breaking down for them: as of May 2026, over 80% of code merged into their production systems was authored by Claude, up from single digits before February 2025. Engineers meanwhile shipped roughly 8x more code per quarter in Q2 2026 than in 2024. The piece is framed as a public accounting of what it looks like when a frontier AI lab uses its own models to accelerate AI development.

The underlying numbers sketch a trend that compounds rather than plateaus. Task completion time horizons, meaning the length of real-world tasks a model can handle reliably, have been doubling roughly every four months, versus every seven months previously. Claude Opus 3 in March 2024 could handle tasks that take a human about four minutes; Claude Opus 4.6 in March 2026 extended that to twelve hours. On software engineering benchmarks, models went from single-digit scores to saturation in two years; on research reproduction benchmarks, from roughly 20% success in 2024 to saturation in fifteen months.

The more striking claim is what happens when Claude is pointed at research judgment rather than code. According to the essay, Claude Mythos Preview chose better next steps than human researchers 64% of the time in April 2026, up from 51% with Opus 4.5 in November 2025. A demonstration in the same month reportedly had Claude-powered agents recover 97% of the performance gap on an AI safety research problem over 800 cumulative hours of work. A March 2026 internal survey of 130 Anthropic researchers estimated a median 4x output multiplier from access to Mythos Preview. These are internal figures reported by the lab itself, not independent peer-reviewed benchmarks.

The honest caveat is that Anthropic is assessing its own progress on its own models, and the incentives all tilt toward optimism. The coordination proposal in the essay, that Anthropic would support a verifiable global slowdown if other frontier labs agreed, comes with an asterisk the authors include themselves: verification would be "much more challenging" than with other technologies, because training runs are easier to conceal than physical infrastructure. The piece does not explain who would verify or what thresholds would trigger action.

What the reporting does not give you is a clear picture of failure modes at scale. Claude-powered code review reportedly catches around one-third of the bugs behind past production incidents, which is a real improvement, but also means two-thirds slip through in code that is now mostly AI-authored. Teams evaluating AI-assisted development workflows have the most to gain from studying these numbers closely: the feedback loop Anthropic describes is already commercially available and accelerating.

Shared on Bluesky by 16 AI experts (top 5 by trust)

Mark Riedl @markriedl.bsky.social: You can read the Anthropic blog post on RSI. It’s… fine. Just don’t let one’s imagination get ahead of things www.anthropic.com/institute/re… →
Ethan Mollick @emollick.bsky.social: I think it is really worth reading this piece on RSI at Anthropic. There is a bit of navel-gazing, some marketing, and a lot of very sincer… →
Tim Kellogg @timkellogg.me: ALERT ALERT they’re posting about RSI on main www.anthropic.com/institute/re... →
Chris Paxton @cpaxton.bsky.social amplified

Tim Kellogg @timkellogg.me

ALERT ALERT they’re posting about RSI on main www.anthropic.com/institute/re...
View on Bluesky →
Sung Kim @sungkim.bsky.social: Anthropic's When AI builds itself "We looked at sessions where a human researcher took a wrong turn, showed Claude the session up to that p… →

Originally reported by anthropic.com

Read the original article →

Original headline: When AI builds itself