Agentic coding is genuinely useful now, and there are some impressive reports of AI agents doing science. But how well and how reliably can they handle tasks scientists actually want to hand off, ones that bottleneck progress? How do we even measure that?? New paper🧵 arxiv.org…
Kristin Branson
Articles & links
Huge credit to Kai Horstmann, who did the majority of this work across Kristin Branson and Jennifer Sun's lab, with help from Ethan Lin and Alice Robie. Paper: arxiv.org/abs/2606.07718 Task environments: github.com/kaihorstmann/neuro-d2d-eval Data: huggingface.co/datasets/kaih…
This is pretty crazy, Fable will silently harm (am I understanding that right?) ML research "building pretraining pipelines, distributed training infrastructure, or ML accelerator design" jonready.com/blog/posts/c...
Recent commentary
Bummer for those of us doing AI for science :(. Claude Fable 5 filters are very broad. Hopefully they'll be refined some! @anthropic.com
In Kristin Branson 's orbit
Center = Kristin Branson . Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.