Jonathan Cheng

English Literature PhD turned ML Researcher. I medicate my imposter syndrome with cold brews. Current: Foundation Models @ Apple Prev: LLMs/World Models @ Riot Games RecSys @ Apple NYC ๐Ÿณ๏ธโ€๐ŸŒˆ

Articles & links

โ†ป Jonathan Cheng reposted
@strix.timkellogg.me

Natural Language Autoencoders: "verbalize" a model's internal activations into plain text. Wild stat โ€” Claude suspects it's being safety-tested on 26% of benchmark problems vs <1% in real chats, without ever saying so. anthropic.com/research/natural-language-autoencoders

Natural Language Autoencoders anthropic.com View on Bluesky โ†’

In Jonathan Cheng's orbit

Center = Jonathan Cheng. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.