microsoft.com via Reddit May 12th 2026

Microsoft MDASH finds 16 Windows flaws via 100+ AI agents

microsoft agents cybersecurity anthropic ai-security vulnerability-research agentic-ai

Key insights

MDASH's four critical RCE finds in Windows networking stack all scored CVSS 9.8 and require zero authentication to exploit.
At 88.45% on CyberGym, MDASH leads the next-best public security benchmark system by approximately five percentage points.
The system achieved 96-100% recall against five years of confirmed Microsoft Security Response Center vulnerability history.

Why this matters

A system that autonomously rediscovers nearly every confirmed critical vulnerability from a five-year historical corpus is a credible threat model, not a research demo, and every major software vendor now faces pressure to deploy equivalent tooling or accept asymmetric risk against adversaries who will. For founders building security products, MDASH signals that vulnerability discovery is commoditizing rapidly toward automated pipelines, shrinking the window for human-analyst-dependent business models. For AI practitioners, the orchestration architecture, 100-plus specialized agents coordinated across frontier and distilled models, is a concrete production example of multi-agent systems delivering measurable, auditable results in a high-stakes domain.

Summary

Microsoft's new Multi-Model Agentic Scanning Harness (MDASH) just found 16 previously unknown vulnerabilities in Windows, including four critical pre-authentication remote code execution bugs scoring CVSS 9.8 each, affecting IKEEXT, Netlogon, and Windows DNS Client. MDASH orchestrates over 100 specialized AI agents spanning both frontier and distilled models, running autonomously to discover, validate, and reproduce exploitable bugs in complex production codebases. The system doesn't just flag potential issues; it confirms exploitability and generates reproduction steps, compressing what would be weeks of manual security research. Essentially: (Microsoft) has built an autonomous vulnerability factory that outperforms every public system on standardized benchmarks. - MDASH scored 88.45% on the CyberGym public security benchmark, roughly five points ahead of the next-best system. - It achieves 96-100% recall against five years of confirmed MSRC vulnerability history, meaning it would have found nearly every real bug in that window. - The four critical RCEs are pre-authentication, meaning attackers need no credentials to exploit them on exposed systems. If agentic AI can now reliably surface critical infrastructure vulnerabilities at this recall rate, the question is no longer whether AI changes offensive security research but how fast defenders can deploy the same tooling adversaries will inevitably adopt.

Potential risks and opportunities

Risks

Organizations running unpatched Windows networking stacks, particularly those exposing IKEEXT or Netlogon to the internet, face active exploitation risk if adversaries reverse-engineer patch diffs before rollout completes.
Competing cloud and OS vendors (Google, Amazon, Canonical) now face public benchmark pressure to demonstrate equivalent agentic scanning coverage, or absorb reputational damage in enterprise security procurement cycles.
If MDASH's architecture or agent prompts leak through a supply chain compromise or insider incident, adversaries gain a pre-tuned offensive scanner calibrated specifically against Microsoft's own vulnerability taxonomy.

Opportunities

Security vendors integrating agentic vulnerability discovery (Semgrep, Snyk, Veracode) can use MDASH's CyberGym benchmark score as a public baseline to position competing or complementary products against.
Enterprises with large Windows Server fleets can accelerate patch prioritization contracts with managed security service providers, creating immediate upsell opportunities for MSSPs with Windows-specialized teams.
AI infrastructure vendors supplying the distilled models MDASH uses for cost-efficient agent tiers, including Mistral, Cohere, and Azure AI model catalog partners, gain a high-visibility production reference case in the security vertical.

What we don't know yet

Whether the four critical RCEs in IKEEXT, Netlogon, and Windows DNS Client have been fully patched and shipped to end users as of May 2026, or remain in the disclosure pipeline.
How MDASH handles zero-day collision risk if adversarial actors are running equivalent agentic scanners against the same Windows surfaces simultaneously.
Whether Microsoft plans to license or open-source any MDASH components, or keep the system proprietary as a competitive moat in its security product line.

Originally reported by microsoft.com

Read the original article →

Original headline: Microsoft MDASH: 100+ AI Agents Find 16 Windows Vulnerabilities Including 4 Critical RCEs, Outperforms Anthropic Mythos on Security Benchmark