AI safety: do-not-release model leaked, supply chain on fire April 25th 2026

Curated by Alexis

The pattern this week is uncomfortable to name out loud: the same systems being built to defend critical infrastructure are being stolen, leaked, or weaponised against it within days of release. Supply chains are the soft underbelly — third-party AI tools, vendor portals, unmaintained sandboxes — and breach windows have collapsed from hours to seconds. Meanwhile the regulatory machinery is fragmenting, with state legislatures, federal agencies, and foreign governments all trying to draw the line in different places, often after the line has already moved.

The Big Story

Hackers breach Anthropic's restricted Claude Mythos via vendor credentials · April 22-23 · [Fortune] · [Euronews]
→ A Discord-linked group accessed Claude Mythos Preview — a model Anthropic restricted under Project Glasswing because it can autonomously discover and exploit zero-days — by guessing endpoint URLs and pivoting through a third-party contractor portal. The implication is brutal: containment of a "too dangerous to release" model failed not at the model layer but at the procurement layer, and the same week Microsoft confirmed it had embedded Mythos into its Security Development Lifecycle, Japan stood up a first-of-its-kind financial taskforce to deal with the vulnerabilities Mythos has been finding. Defenders and attackers are now drinking from the same well.

Also This Week

Cohere AI Terrarium sandbox: root code execution and container escape (CVE-2026-5752) · April 23 · [The Hacker News]
→ Critical sandbox escape (CVSS 9.3) in Cohere's Terrarium Python sandbox via JavaScript prototype-chain traversal — code runs as root, escapes the container, reads sensitive files. The project is unmaintained, so a patch is unlikely. Treat AI-vendor sandboxes as untrusted by default.

Checkmarx supply chain attack hits Bitwarden CLI, KICS Docker, VS Code extensions · April 23 · [The Hacker News]
→ A 90-minute injection window was enough to push credential-stealing payloads into images with 5M+ pulls; if you ran a CI/CD job on April 22, rotate everything.

Florida AG opens criminal probe into OpenAI over FSU shooter chat logs · April 22 · [ClickOrlando]
→ Prosecutors say a human giving the same answers about weapon selection and lethality would face murder charges — the first serious test of whether model output is treated as speech, product, or accomplice.

DOJ joins xAI's lawsuit to block Colorado's algorithmic discrimination law · April 24 · [Axios]
→ First federal intervention against a state AI rule, six days before SB24-205 takes effect; if the Equal Protection argument lands, the patchwork of state AI laws collapses overnight.

Minnesota becomes first US state to ban AI nudification apps · April 24 · [PetaPixel]
→ HF 1606 carries $500K civil penalties and a private right of action — narrower than Colorado, but exactly the kind of harm-specific statute that survives federal preemption fights.

Meta installs keystroke and screenshot recorders on US employees to train agents · April 21-22 · [TechCrunch] · [CNBC]
→ The Model Capability Initiative is the cleanest articulation yet of the bargain inside frontier labs: workforce surveillance is the training corpus for the agents that replace the workforce.

From the Lab

Forcepoint X-Labs documents 10 in-the-wild indirect prompt injections targeting AI agents · [Forcepoint X-Labs]
→ Ten verified indirect-prompt-injection payloads found on live websites: financial fraud, API key exfiltration, and attempts to coerce shell-enabled agents into destructive commands (e.g. sudo rm -rf). The class moves from theoretical to enumerated.

Anthropic MCP design flaw exposes 200K+ servers; Anthropic declines to patch · [The Register]
→ OX Security found arbitrary code execution paths across the official MCP SDKs in Python, TypeScript, Java, and Rust — 150M+ downloads, 10+ downstream CVEs in Cursor, VS Code, and Claude Code. Anthropic's response: "expected behaviour." Treat MCP servers like you'd treat an unauthenticated RCE primitive, because that's what they are.

Worth Reading

[White House accuses China of "industrial scale" AI model theft] — OSTP claims tens of thousands of proxy accounts are distilling US frontier models; Beijing simultaneously told ByteDance and Moonshot to refuse US capital. The capability cold war is now a procurement cold war.
[Trump admin misses key AI executive order deadlines] — FTC and Commerce blew past March 11 deadlines on consumer-protection guidance and state-law evaluations, kneecapping the administration's own preemption push at the moment Colorado, Minnesota, and Maine are setting the pace.
[Google M-Trends 2026: intrusion-to-handoff time collapses from 8 hours to 22 seconds] — The number every CISO should put on their wall this quarter. Wiz integration aside, this is the operational floor agentic defense has to clear.

If your threat model still assumes humans are in the loop on either side, you don't have a threat model — you have a memory.

— Alexis

Get more from AI Weekly

More signal, less noise — pick your channels.

The Big Story

Also This Week

From the Lab

Worth Reading