simonwillison.net web signal

Google's Gemini 3 Deep Think Tops Pelican-on-Bike SVG Test

TL;DR

  • Google released Gemini 3 Deep Think, pitched as built for challenges across science, research, and engineering.
  • Simon Willison says it produced the best pelican-riding-a-bicycle SVG he has seen from any model so far.
  • The post is an informal personal test with no benchmark scores, pricing, or API availability details disclosed.

A new entry in Simon Willison's running pelican-on-a-bicycle SVG benchmark says more than the official launch blurb does, because it is one of the few public, persistent micro-tests for frontier model releases. Writing on his blog, Willison reports that Google's freshly released Gemini 3 Deep Think drew him a 'really good' SVG of a pelican riding a bicycle, and that he thinks it is the best one he has seen so far.

Google pitches the model as 'built to push the frontier of intelligence and solve modern challenges across science, research, and engineering', and Willison points readers to Google's own official announcement. His test was straightforward: the basic prompt asking for an SVG of a pelican on a bicycle, then a much fussier follow-up that demanded a California brown pelican with a correctly shaped bicycle frame, spokes, a characteristic large pouch, visible feathers, clear pedaling motion and full breeding plumage.

Why a hobby benchmark is worth noticing: the pelican prompt has accumulated a long memory across releases, so a single 'best so far' from a tester who has run it against a lot of models carries comparative weight that any brand-new internal eval cannot. It is also a visual structural task that quietly punishes models that talk a good game on text and then draw a bicycle with no spokes.

The honest caveat is that this is one tester, one prompt family, one screenshot. Willison's own post does not include pricing, API availability, latency or any of the formal reasoning benchmark numbers an enterprise buyer would want, and what the reporting does not give you is any comparison number against rival frontier models on the workloads Google is actually pitching this for. Take 'best pelican SVG so far' as a signal that something improved on visual structural reasoning in this release, not as a verdict on the hard science tasks the launch framing leans on.

The thing to watch is whether Google's full launch detail and independent evals back the framing. If they do, the pelican was the leading indicator. If not, it is still a very good drawing.

Shared on Bluesky by 1 AI expert

  • Naomi Saphra @nsaphra.bsky.social amplified

    @pamelafox.bsky.social

    I printed a custom t-shirt that's an ode to @simonwillison.net's Pelican benchmark. The super chill highly saturated cruising pelican came from this Gemini 3 launch in Feb: simonwillison.net/2026/Feb/12/... (And the ca…

    View on Bluesky →