simonwillison.net web signal

Claude Sonnet 5 lands with 1M context and a tokenizer tax

TL;DR

  • Claude Sonnet 5 launched June 30, 2026 with a 1 million token context window, 128,000 max output tokens, and adaptive thinking on by default.
  • Standard pricing matches Sonnet 4.6 at $3 per million input tokens and $15 per million output tokens, with introductory rates of $2/$10 through August 31.
  • A new tokenizer produces roughly 1.42x more tokens on English text, effectively raising costs about 30% despite unchanged per-token pricing.

Anthropic's headline claim on Claude Sonnet 5, released June 30, is that its performance is "close to that of Opus 4.8, but at lower prices," and on the sticker that looks true. Per Simon Willison's rundown on his blog, base pricing matches Sonnet 4.6 at $3 per million input tokens and $15 per million output tokens, with introductory rates of $2 and $10 running through August 31. The more interesting detail is buried further down.

Sonnet 5 ships with a new tokenizer, and Willison's testing shows the same input text produces approximately 30% more tokens than on Sonnet 4.6. His measured multipliers were 1.42x for English, 1.33x for Spanish, 1.27x for Python code, and roughly unchanged for Simplified Mandarin. If you are paying per token and running an English-heavy workload, the "same price" is meaningfully more expensive than it looks. Mandarin-heavy users are the outlier winners.

The other shifts are structural. Sampling controls that pipeline authors have leaned on for years, temperature, top_p, and top_k, are no longer supported, so any code that tuned sampling behaviour needs to be revisited. The context window is 1 million tokens with a 128,000 token output ceiling, and adaptive thinking is on by default unless you disable it in settings. Anthropic's system card, as Willison quotes it, positions the model as "significantly less capable at cyber tasks than Mythos 5," which the company says lets it apply safeguards comparable to Opus 4.7 and 4.8 and release the model without additional restrictions.

The honest caveat is that Willison's post is one careful reader's early look, not an independent benchmark run, and the Opus 4.8 comparison is Anthropic's own claim. His pelican illustration test, a running joke on his blog, came back with Sonnet 5 describing the image as a goose on a bicycle, which is not science but does hint that vision behaviour has shifted in ways worth checking. What the reporting does not give you is a like-for-like independent evaluation, or clarity on which vendor Mythos 5 belongs to.

For teams weighing an upgrade, the number that matters is not the sticker rate but the effective rate on your actual corpus. Count your tokens before you migrate.

Shared on Bluesky by 2 AI experts