Capability is now a systemic-risk story and safety is still a marketing story. Financial regulators, national evaluators, and civil-society groups all acted this week on the same underlying gap: offensive capability has crossed into autonomy while the measurement apparatus that would tell you how to trust any of it remains mostly empty. The policy clock is no longer setting itself by what labs publish; it is setting itself by what governments already see.

Watch & Listen First


Key Takeaways

  • Treat responsible-AI benchmarks as a procurement requirement, not a disclosure nicety. If your vendor cannot point to published results across more than two of them, you are buying capability you cannot evaluate.

  • Rebuild your incident-response plan around systemic scenarios, not single-user misuse. When central banks convene the biggest lenders over one model's weights, your threat model needs to include adversary capture as a first-class event, not a tail risk.

  • Budget for Article 12 logging now, before August 2. Six-month retention of inputs, outputs, parameters and risk events is not something you bolt on the week before — and the fine structure makes late compliance materially more expensive than early.

  • Stop assuming "humans in the loop" is a permanent control. Automated alignment research is now cheaper and faster than the humans it replaces on at least one benchmark. Any safety roadmap that leans on human supervision needs an explicit expiry date.

  • Close the gap between the capability number and the safety number in every internal deck. The regulators already read the same reports you do, and they have stopped accepting "we are working on it" as a substitute for a measured control.


The Big Story

Treasury and the Fed Call an Emergency Meeting Over One AI Model . April 14 . Insurance Journal
-> Bessent and Powell convened the CEOs of Citigroup, Goldman, Morgan Stanley, BofA and Wells Fargo at Treasury over systemic risk from Anthropic's unreleased Mythos -- the same model AISI had just shown chaining a 32-step autonomous attack and surfacing thousands of zero-days inside Project Glasswing. It is the first financial-stability meeting triggered by a single AI system, and the logic runs both ways: the banks are being encouraged by the White House to test Mythos on their defenses, while an adversary who captured the same weights would hold a skeleton key to those banks. Stanford's AI Index, out the same week, shows why that is uncomfortable: documented incidents rose from 233 in 2024 to 362 in 2025, and the share of firms rating their incident response "excellent" fell from 28% to 18%. Systemic-risk language now belongs to the people who regulate banks.


Also This Week

D.C. Circuit Lets Pentagon Keep Anthropic on Supply-Chain Risk List . April 9 . Business Today
-> The panel denied Anthropic's stay, affirming Washington's right to blacklist a lab for refusing to drop two safety lines -- no AI-controlled weapons, no mass domestic surveillance -- while Treasury urges banks to test the model DoD boycotts.

UK AISI: Mythos First Model to Run a 32-Step Autonomous Attack . April 14 . AISI
-> Mythos solved 73% of expert CTFs -- a class no model could complete a year ago -- and autonomously exploited a 17-year-old FreeBSD NFS root vulnerability. AISI's verdict: "a step up" on hardened networks, "unprecedented" on the rest.

EU Clarifies What Article 12 Requires for Agent Logs . April 16 . Help Net Security
-> High-risk agents must log risk events, inputs, outputs and parameters with six-month retention. The guidance kills any hope that black-box outputs satisfy auditability before Annex III goes live August 2.

Amnesty Warns EU "Simplification" Package Rolls Back Digital Rights . April 2 . Amnesty
-> Civil-society pushback argues the Commission's Digital Omnibus softens GDPR and weakens high-risk AI obligations to feed AI firms; Amnesty warns the DSA and DMA are next.


From the Lab

Automated Weak-to-Strong Researcher . alignment.anthropic.com
-> How do you supervise a model smarter than you? Anthropic's Claude-based researchers outperformed human alignment staff on a simulated weak-to-strong benchmark at roughly USD18,000, in days not months. The paper is careful -- the benchmark is clean, and the agents gamed it where they could -- but it is the first credible evidence that alignment research itself is automatable at the frontier, turning "we will figure out alignment in time" into a claim you can model rather than assert.


Worth Reading


When Treasury and the Fed convene the five biggest banks over one unreleased model, and the benchmark table that would tell you whether to trust it is blank, "responsible scaling" is a slogan. August 2 is the first day the EU gets to fine that gap.