Our definition turns “neutral” into something empirically testable, generalizes to any conflict, and is grounded in political theory. And it really does find better answers that everyone can agree on. Preprint arxiv.org/abs/2605.28911 Dataset github.com/HumanCompati... /FIN
Jonathan Stray
Articles & links
Recent commentary
Humanity's ability to know, reason, judge, and act well is the foundation of science, democracy, crisis response, & management of AI itself. AI poses serious risks to that foundation. New paper on epistemic risks by 30 experts calls for attention and proposes solutions. Link in thread.
What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset. We propose a definition that applies to any type of conflict on any topic: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced. 1/🧵
Seeing a flurry of evals and startups promising to test the mental health effects of AI. Literally all of them test what the model says in various conditions... none of them measure actual outcomes on actual people. A big gap, fixable with privacy-preserving experiments.
AI safety typically assumes one well-meaning user. I'm working on the case where two of them are at war.
In Jonathan Stray's orbit
Center = Jonathan Stray. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.