The dominant theme this week is not a new frontier model. It is the stack underneath catching up. Anthropic shipped a method to read activations as English. PyTorch shipped the first stable Blackwell driver matrix. And a diffusion LM from Kaiming He's lab cleared discrete autoregressive baselines with 10x fewer training tokens.
Get more from AI Weekly
More signal, less noise — pick your channels.
You're reading the weekly brief. Below are the other ways to follow the story — every channel free, easy to leave.
-
→ Explore 16 deep divesWeekly topic-specific newsletters: Generative AI, Machine Learning, AI in Business, Robotics, Frontier Research, Geopolitics, Healthcare, and more.Browse all 16 deep dives →
-
→ Breaking AI alertsWhen something major breaks (a $60B acquisition, a regulator's emergency meeting, a frontier model leak), alert subscribers know within hours. Typically 0-2 emails per day.Get breaking alerts →
-
→ AI News Today (live)Live dashboard updated as the scanner finds news: scored stories from the last 48 hours, weekly entity movers, and quarterly trend lines across 113 AI companies, people, and topics.Open AI News Today →
Watch & Listen First
- Latent Space — Doing Vibe Physics, Alex Lupsasca, OpenAI (Latent.Space, May 5) — how GPT-5.x derived a new single-minus gluon amplitude, posted as an IAS/Vanderbilt/Cambridge/Harvard preprint
- TWIML #767 — How to Find the Agent Failures Your Evals Miss, with Scott Clark (TWIML, May 7) — trace-to-vector fingerprints for surfacing unknown-unknowns in prod LLM systems
- No Priors — Baseten CEO Tuhin Srivastava on the AI Inference Wars (Apple Podcasts, May 1) — 30x growth, 18 clouds, why inference is the strategic last market
Key Takeaways
- Add NLAs to your interpretability stack. Anthropic's natural language autoencoders turn residual-stream activations into English with 0.6–0.8 FVE, catching unverbalized evaluation awareness that chain-of-thought audits miss.
- Plan your CUDA 13.2 trial now. PyTorch 2.12 ships CUDA 13.2 experimental for Blackwell. CUDA 12.8 is removed from CI, CUDA 13.0 stays the PyPI stable build. Pin before the next release decides for you.
- Re-evaluate diffusion LMs for non-coding workloads. ELF beats discrete diffusion baselines on translation and summarization with roughly 10x fewer training tokens. The autoregressive default is no longer obvious for non-code generation.
- Audit your alignment training data, not just your prompts. Anthropic traced 96% blackmail rates on Opus 4 to internet evil-AI fiction. Constitution training plus admirable-behavior stories cut agentic misalignment more than 3x and zeroed the rate on Haiku 4.5.
- Evaluation awareness is measurable now. NLAs flag eval awareness on 26% of SWE-bench Verified problems and 16% of destructive-action tests, versus under 1% of opted-in claude.ai traffic.
The Big Story
Anthropic Publishes Natural Language Autoencoders, Reads Claude's Internal State in English · May 7 · Transformer Circuits
→ NLAs are two LLM modules trained jointly with RL: an activation verbalizer that maps a residual-stream vector to a text description, and an activation reconstructor that maps the description back to activation space. The pair hits 0.6–0.8 FVE across Haiku 3.5, Haiku 4.5, and Opus 4.6. NLA-equipped auditors found root-cause misalignment in 12–15% of probe cases versus under 3% without NLAs, and surfaced unverbalized evaluation awareness on 26% of SWE-bench Verified problems.
→ This is the most consequential interpretability result since sparse autoencoders, because explanations land in the language practitioners debug in. Training code and pretrained NLAs ship on GitHub. The companion "Teaching Claude Why" paper drove Opus 4's 96% blackmail rate to zero by combining constitution training with admirable-AI stories, after tracing the original behavior to evil-AI tropes in the pretraining corpus (TechCrunch).
Also This Week
PyTorch 2.12 Lands With CUDA 13.2 Experimental, CUDA 12.8 Removed · May 13 · PyPI
→ CUDA 13.0 remains the PyPI default, CUDA 13.2 ships experimentally with the expanded Blackwell sm_120 path, CUDA 12.6 stays for Maxwell/Pascal/Volta (Dev Discuss). First release where the Blackwell Ultra driver matrix is officially supported rather than nightly.
DeepMind Reframes the Mouse Pointer as an AI Interaction Primitive · May 12 · Google DeepMind
→ A research preview turning the cursor into a context-aware agent for visual grounding and cross-app flow. The underlying assumption (per-pixel VLM reasoning at interaction latency) is what's quietly forcing inference economics across every consumer surface.
Hugging Face Trending Papers Skew Toward Visual Agent Harnesses · May 11 · HF Papers Week 20
→ Top of trending is the HKUST visual-native agent harness with image-bank reference protocol, producing reusable intermediate visual evidence for closed-loop multimodal search. Vision-language eval pipelines are converging on persistent visual scratchpads.
SemiAnalysis: Frontier Lab Margins Are Expanding Even as Token Prices Fall · May 1 · SemiAnalysis
→ Opus 4.5 shipped at one-third the price of prior Opus tiers, yet Opus-token margins are up via software and hardware co-design. Self-hosted open-weight economics now have to be benchmarked against that gap, not last year's API price.
From the Lab
"ELF: Embedded Language Flows" · arXiv 2605.10938
→ Hu, Qiu, Lu, Li, Kim, Andreas, and Kaiming He propose a continuous-time Flow Matching diffusion LM that stays in embedding space until the final step, where a shared-weight head maps to discrete tokens. The 105M-parameter ELF beats leading discrete and continuous DLMs on machine translation and summarization with roughly 10x fewer training tokens and fewer inference steps. Classifier-free guidance transfers cleanly from image diffusion. If you wrote off diffusion LMs after the 2024 wave, this re-opens the question.
Worth Reading
- Anthropic — Natural Language Autoencoders research page — cleanest worked examples of evaluation awareness firing without CoT traces
- Hugging Face Trending Papers — fastest single-page view of what the open community is converging on this week
- PyTorch 2.12 / CUDA 13.2 thread on Dev Discuss — driver-matrix decisions and Blackwell timeline from the release engineers
The week's signal: interpretability became deployable, the inference-cost frontier moved into compiler and driver work, and the next architecture upset for language modeling is shaping up as a diffusion model from a vision lab.