404media.co web signal

De Wynter Uses Age of Empires Goats to Challenge LLM Sentience

TL;DR

  • Adrian de Wynter built a working neural network inside Age of Empires II using goats as bits to argue LLMs are not uniquely human-like.
  • De Wynter's review of 315 AI papers found 57 percent assumed LLMs have human-like traits before any experiment was run.
  • De Wynter argues human-like attributions to LLMs measure the chat interface's presentation, not actual system behavior.

A Microsoft and University of York researcher named Adrian de Wynter recently published a paper titled "If LLMs Have Human-Like Attributes, Then So Does Age of Empires II," and the premise is exactly what it sounds like. De Wynter built a working neural network inside the classic strategy game using goats as computational bits: standing on grass means 0, standing on a bridge means 1, and with some careful use of the scenario editor's scripting tools, the result is two XNOR gates and one AND gate that can learn the logical AND function. 404 Media reports that de Wynter's project connects to science fiction writer Ted Chiang's viral essay, which asked readers to consider whether Microsoft Word might be conscious if LLMs are.

The argument is methodological as much as philosophical. De Wynter formally demonstrates that Age of Empires II is Turing-complete, meaning any computation an LLM performs could in principle run inside the game. He extends this to a thought experiment covered by The Decoder: the same mathematical operations could theoretically run through 667,000 Greater Boston residents texting computational steps, and the outputs would be identical to an actual language model. Nobody watching that happen would claim the city of Boston was developing empathy or self-awareness. The point is that the illusion of mind comes from the chat interface, not from the underlying computation.

The number that makes this critique land hardest comes from de Wynter's review of 315 AI papers published from mid-2024 to mid-2026: 57 percent already assumed LLMs possess human-like traits in their premises, while 36 percent reached corresponding conclusions. When a paper starts from the assumption that a model has fear or self-awareness and then designs an experiment to detect exactly that, the finding is predetermined. De Wynter's own framing, drawn directly from the paper: "many anthropomorphic measurements in AI are measurements of presentation, rather than of an actual system's behaviour."

The honest caveat is that a methodological critique of published papers does not settle the deeper philosophical question of whether any substrate could eventually give rise to something worth calling experience. De Wynter proposes updating a 19th-century principle that machine behavior should not invoke higher cognitive processes when simpler explanations suffice. What the reporting does not give you is a detailed account of which specific venues or papers are most implicated, or a concrete peer-review standard that would catch circular reasoning before publication.

Researchers and teams building AI safety evaluation frameworks have the most direct stake in this work: if the measurement tools for 'human-like' AI behavior rest on circular assumptions, the whole scaffolding of consciousness-adjacent risk assessment may need revisiting from the ground up.

Shared on Bluesky by 8 AI experts (top 5 by trust)