arxiv.org web signal

Adrian de Wynter Shows LLM Human-Like Attributes Are Non-Unique

TL;DR

  • Adrian de Wynter argues LLM anthropomorphic attributes are empirically non-unique, using Age of Empires II as a counterexample.
  • De Wynter proves Age of Empires II is functionally and Turing-complete, making it a formal stand-in for LLM behavioral claims.
  • The paper proposes a 'null assumption': treating LLM non-uniqueness as the experimental baseline rather than assuming human-like attributes.

Adrian de Wynter's new paper on arxiv starts from a blunt observation: when researchers claim that large language models have human-like qualities such as morality or understanding of natural language, the evidence they rely on would, applied consistently, also support those claims for Age of Empires II.

The paper's goal is not to argue for or against whether LLMs actually have such attributes. The narrower point is that current methods for detecting them produce conclusions that are not specific to LLMs. To demonstrate this, de Wynter built and trained a simple neural network on the videogame and argues that any entity in a sufficiently-powerful substrate, including, the paper says, LEGO or the Greater Boston Area, could in principle present the same properties. The conclusion: the purported anthropomorphic attributes of LLMs are "empirically non-unique."

The theoretical anchor for this is a proof that Age of Empires II is functionally and Turing-complete. The point is not that this is surprising but that it matters: if a system capable of simulating any computation can exhibit any behavioral signature, and those signatures are what researchers are measuring when they attribute human-like qualities to LLMs, the inference breaks down. Some observable properties, the paper notes, such as responses to prompts, might remain invariant across substrates, but "the interpretation of their perceived behaviour might change with the substrate."

De Wynter proposes what he calls a "null assumption": that researchers should assume LLM non-uniqueness as their baseline when designing experiments, rather than assuming anthropomorphic attributes and then testing for them. The paper argues that assuming such attributes either exist or do not, without substrate-independent measurement criteria, leads to "circular or uninformative conclusions" regardless of the outcome.

The honest limit here is that the paper identifies the problem more thoroughly than it resolves it. What explicit, substrate-independent measurement criteria would actually look like in practice is not spelled out in the abstract. For practitioners, though, the implication is concrete: a large fraction of benchmark claims, capability assessments, and policy arguments about LLMs rest on anthropomorphic language, and this framework gives critics of that framing a formal basis to push back.

Shared on Bluesky by 2 AI experts