David Bau

Interpretable Deep Networks. http://baulab.info/ @davidbau

Articles & links

But AI lie detection is hard and remains a central research challenge. Recent research suggests that simple probes can pick up on neural "tells" that reveal when it is lying, even when the output looks clean. anthropic.com/research/pr... arxiv.org/abs/2502.03407

Simple probes can catch sleeper agents anthropic.com
View on Bluesky · ♥ 0 ↻ 0 ↩ 1 · 13d ago

Recent commentary

"You're right to call me on that!" Can you catch an AI in the act of lying? Register below to enter our AI lie-detection contest. AI lies are a big problem. The frontier labs have all worked hard to fight AI deception. They all try to monitor their AIs for it.

View on Bluesky · ♥ 6 ↻ 2 ↩ 1 · 13d ago

In David Bau's orbit

Center = David Bau. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.