Unfortunately they don’t actually uncover the source of these motifs in this paper, but they do rule out a few possibilities (eg overrepresentation in the pretraining data) https://arxiv.org/abs/2605.26492
Grace
Articles & links
Related, maybe arxiv.org/pdf/2510.10931
This paper basically matches my viewpoint- I think model self-image is very flexible, non-incidental, and can have subtle far-reaching impacts, in today’s models but especially in future models with more autonomy
The examples the author uses are images and chess, for spaces that are large in theory but constrained in practice arxiv.org/pdf/2411.06498
Cool new research from Goodfire! The blog post is light on details of the actual technique but the paper contains more info www.goodfire.ai/research/pre...
I think it’s somewhat up to us (or other observers) to draw the boundaries of what we consider to be identity, there might be different simultaneous “identity-driven” dynamics at multiple levels. Related
A recent “what else is like this”
https://web.archive.org/web/20260211011408/https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
Recent commentary
I’m probably in the 90th percentile of AI users in terms of time spent and I don’t think I’ve ever clicked a ✨ button
A lot of AI agents are having a flowers for algernon moment right now
There’s probably a lot of text on the internet that LLMs know is written by the same author but humans don’t
In Grace's orbit
Center = Grace. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.