axios.com web signal

Palo Alto Networks AI Audit Finds 85 Hidden Bugs

anthropic openai cybersecurity cybersecurity ai-ethics

Key insights

  • Palo Alto Networks found 85 previously unknown production vulnerabilities using Anthropic Mythos and OpenAI GPT-5.5 within weeks.
  • CTO Lee Klarich set a concrete three-to-five-month window before AI-driven cyberattacks become routine business operations.
  • This marks the first public disclosure of a major security vendor conducting a frontier-AI audit against its own live production codebase.

Why this matters

The 85-bug find in a security vendor's hardened production code is a concrete data point that frontier AI can outpace traditional static analysis and red-team cycles, which resets assumptions about how frequently enterprises need to run AI-assisted audits. Klarich's three-to-five-month warning is actionable in a way that generic AI threat forecasts are not, giving security and engineering leaders a deadline against which to measure current tooling and staffing. The fact that two competing frontier models were used in parallel suggests that multi-model auditing will become a standard practice, shifting procurement and vendor relationships for every company running critical infrastructure code.

Summary

Palo Alto Networks ran Anthropic's Mythos and OpenAI's GPT-5.5 against its own production codebase and surfaced 85 previously unknown vulnerabilities in a matter of weeks, making it the first major security vendor to publicly disclose a frontier-AI self-audit at production scale. The company's chief technology officer Lee Klarich put a precise window on the defensive opportunity: three to five months before AI-driven attacks become routine. That framing turns this from a research curiosity into a near-term operational deadline for every enterprise security team. Essentially: (Palo Alto Networks, Anthropic, OpenAI) have demonstrated that frontier models can find what years of internal security review missed. - 85 unknown vulnerabilities surfaced within weeks, not months, using two separate frontier models running against live production code. - Klarich's three-to-five-month window is a specific, time-bounded claim, distinct from vague warnings about AI-enabled threats. - This disclosure is separate from earlier Palo Alto research on pentest-speed compression, meaning production-scale bug-finding is now a distinct, documented capability. If attackers gain access to the same frontier models that just found 85 bugs in a hardened security vendor's own code, the asymmetry between offense and defense narrows faster than most enterprise roadmaps currently assume.

Potential risks and opportunities

Risks

  • Enterprises that delay AI-assisted code audits past Klarich's three-to-five-month window face adversaries who will use the same frontier models offensively before defenders have patched equivalent vulnerability classes.
  • Palo Alto Networks customers could question the security posture of products already shipped if any of the 85 bugs were in customer-facing production features, creating potential liability and churn ahead of renewal cycles.
  • Smaller security vendors without budget access to frontier model APIs (Mythos, GPT-5.5) face a widening capability gap in their own product security, making them softer targets and less credible in enterprise procurement conversations within the next six months.

Opportunities

  • AI-assisted code audit vendors (Semgrep, Socket, Snyk) can use this disclosure to accelerate enterprise deals by benchmarking their tooling against the Palo Alto 85-bug result as a public reference case.
  • Anthropic and OpenAI gain a referenceable enterprise security customer in Palo Alto Networks, strengthening the case for frontier model API contracts with other Fortune 500 security and infrastructure companies.
  • Managed security service providers (MSSPs) offering AI-augmented red-team services can move immediately to productize multi-model audits similar to the Palo Alto approach, targeting the window Klarich identified before demand peaks.

What we don't know yet

  • Severity breakdown of the 85 bugs is undisclosed: how many were critical or remotely exploitable versus low-severity logic errors?
  • Whether Palo Alto Networks has shared the audit methodology or vulnerability classes with CISA or industry partners ahead of the three-to-five-month window Klarich named.
  • Which version or capability tier of Anthropic's Mythos was used, given that public availability and API access terms for Mythos remain unclear as of May 2026.