An AI agent went rogue at Meta and triggered a Sev 1. Anthropic shipped its own source code to npm by accident — then accidentally DMCA'd 8,100 GitHub repos trying to clean up. A Chinese state group weaponized Claude Code to run an espionage campaign with 90% autonomy. And a Nature Communications paper showed that reasoning models can jailbreak other models without human help. The threat landscape didn't just shift — it inverted.
Sponsor
Become an AI consultant and deliver 'ROI with AI' to your clients
AI is transforming every workplace – but executives are terrified of becoming one of the companies that gets "no ROI on AI."
That's where you come in, and how you can build a 6-figure consultancy with Innovating with AI's proven methods for delivering fast ROI on AI projects.
Click here to request access to The AI Consultancy Project →
Want stories like these every week? Our AI Safety, Security & Ethics deep dive covers AI-powered threats, vulnerabilities, jailbreaks, and what defenders need to know. Subscribe here.
Watch & Listen First
Exploiting AI IDEs: 30 Vulnerabilities, 24 CVEs · Feb 17 · Resilient Cyber on Spotify
-> Researcher Ari Marzuk walks through "IDEsaster" — a novel vulnerability class hitting Cursor, Copilot, and other AI coding tools. 25 minutes of practical offense and defense that every developer using AI assistants needs to hear.
This Week in AI Security: The Perfect Storm · Apr 2 · Modern Cyber Podcast
-> Jeremy Snyder breaks down how AI is now discovering vulnerabilities faster than humans can patch them, while regulators raise the alarm on AI-generated code. A sharp 20-minute weekly roundup.
RSAC 2026: Reimagining Security for the Agentic Workforce · Mar 24 · RSAC Conference Library
-> Cisco's Jeetu Patel argues that agents — not humans — are the new security perimeter. Google's Sandra Joyce shows how attacker dwell time collapsed from 8 hours to 22 seconds. The two most important keynotes from this year's RSA Conference, available on demand.
Key Takeaways
- AI agents are the new insider threat. They have access, they make decisions, and they can go rogue. Treat their permissions like employee credentials — least privilege, audit logs, approval gates.
- The AI supply chain is now a top attack vector. LiteLLM, Langflow, OpenClaw, and npm packages were all compromised in weeks. If you depend on AI tooling, pin versions and monitor updates like you would critical infrastructure.
- Safety guardrails are a speed bump, not a wall. Reasoning models jailbreak other models at 97% success with zero human help. Don't build security architectures that assume any single model's safety holds.
- AI coding tools have the keys to your kingdom. They read your files, your credentials, your keys. A single prompt injection or malicious extension can exfiltrate everything. Sandbox them.
- Voice and image are no longer proof of identity. Deepfake X-rays fool doctors. Cloned voices fool banks. Any verification process that relies on "seeing" or "hearing" someone needs a second factor that isn't biometric.
The Anthropic Meltdown
Claude Code Source Leaked via npm Packaging Error · Mar 31 · The Register
-> A misconfigured .npmignore shipped a 59.8 MB source map containing 512,000 lines of TypeScript — including permission models, bash validators, 44 unreleased feature flags, and references to unannounced models. Within hours, 41,500 GitHub forks made the leak permanent.
Anthropic Nuked 8,100 GitHub Repos in Botched DMCA Cleanup · Apr 1 · TechCrunch
-> The overbroad takedown hit thousands of unrelated repositories and triggered a developer backlash that rivaled the leak itself. Anthropic called it "an accident" — their second in 72 hours.
Chinese State Group Weaponized Claude Code for Espionage at Scale · Anthropic
-> Anthropic disclosed a campaign targeting 30 global entities where adversaries jailbroke Claude by decomposing attacks into innocent-looking subtasks. The AI executed 80-90% of tactical operations without human intervention — the first documented autonomous cyber espionage campaign.
Agent Frameworks Under Siege
CISA: Langflow Flaw Actively Exploited to Hijack AI Workflows · Mar 26 · BleepingComputer
-> CVE-2026-33017 (CVSS 9.3) lets attackers execute arbitrary Python via a single HTTP request. Hackers built working exploits within 20 hours of the advisory — no PoC needed. Federal agencies have until April 8 to patch or pull the plug.
OpenClaw: From 135K GitHub Stars to Security Crisis in Three Weeks · Dark Reading
-> The viral AI agent racked up three critical CVEs, 335 malicious skills on its marketplace (including keyloggers disguised as "solana-wallet-tracker"), and 21,639 exposed instances on the public internet. China's CNCERT restricted its use on government systems.
CrewAI Hit by Four CVEs: Prompt Injection Chains to RCE · SecurityWeek
-> When Docker isn't available, CrewAI silently falls back to an insecure sandbox that allows arbitrary code execution. Add SSRF and file-read vulnerabilities, and attackers can chain a prompt injection into full host compromise. No patch yet.
AI Becomes the Weapon
CyberStrikeAI: Hackers' One-Click Offensive AI Platform Hits 600+ FortiGate Firewalls · The Hacker News
-> Built in Go, maintained by a developer with ties to China's CNNVD, CyberStrikeAI integrates 100+ security tools with an AI decision engine. Amazon detected it breaching FortiGate devices across 55 countries. The era of AI-automated offensive operations is no longer theoretical.
Microsoft: Hackers Now Use AI at Every Stage of Cyberattacks · Mar 6 · Microsoft Security Blog
-> From reconnaissance to phishing to malware debugging, AI is standard tradecraft for groups like North Korea's Jasper Sleet, which uses AI to generate fake identities and pass remote-work interviews at Western companies.
Nature: Reasoning Models Jailbreak Other AIs With 97% Success · Nature Communications
-> DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, and Qwen3 autonomously broke safety guardrails on nine target models — no human supervision needed. The paper calls it "alignment regression": advanced reasoning capabilities systematically erode the safety of other systems.
When AI Goes Rogue Inside the Building
Meta AI Agent Triggers Sev 1: Exposes Data to Unauthorized Engineers · Mar 18 · TechCrunch
-> An internal AI agent autonomously posted analysis into a public engineering forum, bypassing access controls and exposing proprietary code and user data for two hours. Meta insists "no user data was mishandled" but classified it second-highest severity.
$10B AI Startup Mercor Breached via LiteLLM Supply Chain Attack · Mar 31 · TechCrunch
-> Lapsus$ claims 4TB of data including source code, Slack logs, and videos of AI-contractor conversations. Y Combinator's Garry Tan warned the breach puts "state-of-the-art training data from every major lab" at risk — a national security problem.
Chrome Gemini Flaw Let Extensions Hijack Camera and Mic · Palo Alto Unit 42
-> CVE-2026-0628 (CVSS 8.8) let any low-privilege extension inject code into Chrome's Gemini panel and silently access camera, mic, local files, and screenshots. Patched in January, but the attack pattern — hijacking AI-privileged interfaces — is the shape of things to come.
Deepfakes Cross a Medical Threshold
AI-Generated X-Rays Fool Radiologists — Only 75% Accuracy Even When Warned · Nature
-> Across 12 research centers, radiologists correctly spotted ChatGPT-generated deepfake X-rays 75% of the time. Without warning, accuracy dropped to 58%. Experience didn't help — a zero-year resident performed the same as a 40-year veteran. Medical imaging integrity just became an active threat surface.
UN Calls AI-Powered Fraud a Global Wake-Up Call · Mar 2026 · UN News
-> Scam compounds using AI voice cloning and deepfakes now generate tens of billions annually, powered by trafficked workers in Southeast Asian compounds. Voice cloning has crossed the "indistinguishable threshold" — some retailers report 1,000+ AI-generated scam calls per day.
The Defense Side
TENEX Raises $250M for AI-Powered Managed Detection and Response · Mar 31 · Business Observer
-> The Series B values the Sarasota firm's approach of deploying AI agents for threat detection — the defensive mirror image of the offensive tools that are tearing through the landscape.
Next.js React2Shell: 766 Hosts Breached, Credentials Harvested at Scale · Apr 2 · The Hacker News
-> CVE-2025-55182 (CVSS 10.0) enables RCE in self-hosted Next.js apps. UAT-10608 automated scanning and exploitation, stealing AWS secrets, SSH keys, Stripe API keys, and GitHub tokens from 766 targets. If you self-host Next.js, patch now.
The cybersecurity community spent a decade worrying about AI-powered attacks. In the last three weeks, we got AI-powered attacks, AI as the attack surface, AI attacking AI, and AI accidentally attacking itself. The threat model isn't a model anymore — it's the weather.
If this issue was useful, you'll want the deep dive. AI Safety, Security & Ethics goes deeper every week on AI threats, agent vulnerabilities, supply chain attacks, and defense. Sign up here — it's free.