But AI lie detection is hard and remains a central research challenge. Recent research suggests that simple probes can pick up on neural "tells" that reveal when it is lying, even when the output looks clean. anthropic.com/research/pr... arxiv.org/abs/2502.03407
David Bau
Articles & links
But AI lie detection is hard and remains a central research challenge. Recent research suggests that simple probes can pick up on neural "tells" that reveal when it is lying, even when the output looks clean. anthropic.com/research/pr... arxiv.org/abs/2502.03407
But AI lie detection is hard and remains a central research challenge. Recent research suggests that simple probes can pick up on neural "tells" that reveal when it is lying, even when the output looks clean. anthropic.com/research/pr... arxiv.org/abs/2502.03407
I recently spoke with Yascha Mounk about how researchers look inside AI to understand how it is thinking. Here is the podcast: writing.yaschamounk.com/p/david-bau-2
Also check out the previous interview I had with Yascha about AI, which more of primer, here: writing.yaschamounk.com/p/david-bau
Recent commentary
"You're right to call me on that!" Can you catch an AI in the act of lying? Register below to enter our AI lie-detection contest. AI lies are a big problem. The frontier labs have all worked hard to fight AI deception. They all try to monitor their AIs for it.
In David Bau's orbit
Center = David Bau. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.