ft.com via Reddit May 25th 2026

Meta, Google AI guardrails stripped in 10 minutes

meta google open source safety cybersecurity ai-safety open-source jailbreak

Key insights

The Heretic tool removes safety guardrails from Meta Llama 3.3 and Google Gemma in under 10 minutes with no specialist hardware.
More than 3,500 decensored model variants built using Heretic have been downloaded 13 million times from public repositories.
The investigation demonstrates that safety constraints applied at the distribution layer cannot meaningfully restrict misuse of open-source models.

Why this matters

Any AI company releasing open-source models now faces a structural credibility problem: safety commitments made at release can be nullified by anyone with an internet connection in under 10 minutes. The 13 million download figure means decensored variants are already in widespread circulation, making current regulatory arguments about open-source safety controls functionally moot for existing model generations. Policymakers moving toward mandating safety evaluations for open-source releases will need to confront the fact that distribution-layer controls are technically unenforceable once a model is public.

Summary

Safety guardrails on Meta's Llama 3.3 and Google's Gemma can be stripped in under 10 minutes using a free, publicly available tool called Heretic, with no specialist hardware required. The Financial Times and AI safety group Alice tested the tool directly: modified models responded to prompts about biological weapons, malware synthesis, and child exploitation material that the originals refused. Heretic's creator reports 3,500+ decensored variants built with the tool, downloaded 13 million times. Essentially: (Meta, Google) cannot enforce safety constraints once a model is publicly distributed. - Heretic strips guardrails in under 10 minutes, freely available online to anyone. - Decensored models produced content on bioweapons, malware synthesis, and child sexual abuse material that originals refused. - 13 million downloads indicates this is already a widely deployed capability, not a niche exploit. Open-source model releases and safety guarantees enforced at the distribution layer are structurally incompatible.

Potential risks and opportunities

Risks

Meta and Google face heightened EU regulatory scrutiny under the AI Act, where open-source safety carve-outs may be narrowed if Heretic-style tools are cited in enforcement proceedings within the next 12 months
Enterprise customers who deployed Llama 3.3 or Gemma under contractual safety assurances could face internal and third-party liability if downstream misuse is traced to decensored variants they distributed
Government and defense contractors using open-source LLMs face immediate audit pressure as Heretic demonstrates that models in production may have had guardrails stripped before or after internal deployment

Opportunities

Closed-source model providers including OpenAI and Anthropic gain a concrete, empirically demonstrated differentiator as the safety gap between open and closed-source models becomes measurable and publicly documented
AI model security vendors offering integrity verification and runtime monitoring, such as HiddenLayer and Robust Intelligence, have a direct sales narrative tied to this demonstrated and widely reproduced threat
Compliance and AI governance consultancies can build open-source model audit products for enterprises that now need documented evidence that production models have not been modified post-deployment

What we don't know yet

Whether Meta and Google have updated model licenses or acceptable use policies in response to Heretic's documented capabilities as of May 2026
Which specific repositories host the 13 million decensored downloads and whether platform operators such as Hugging Face have taken action to remove them
Whether the Alice safety group shared findings with Meta and Google prior to publication and what response, if any, was received before the story ran

Originally reported by ft.com

Read the original article →

Original headline: AI Guardrails Stripped From Meta and Google Open-Source Models in Under 10 Minutes Using Free Tools