1Password deploys AI agents to rewrite monolith
Key insights
- 1Password's AI agents achieved meaningful velocity gains on monolith refactoring but required human intervention for specific cross-file dependency failure modes.
- The team had to design rollback logic and oversight patterns deliberately into the workflow, not treat them as automatic safety nets.
- Test suite maintenance emerged as a distinct challenge requiring agent-specific attention, separate from the core refactoring task.
Why this matters
1Password operates under unusually strict security requirements, so their willingness to run autonomous AI agents on production code is a meaningful signal that agentic refactoring has crossed a credibility threshold that risk-averse engineering orgs will notice. The detailed failure-mode taxonomy they published gives other teams a concrete starting checklist for what human oversight actually needs to catch, rather than vague gestures at "keeping humans in the loop." For founders building agentic coding tools, the specific gaps 1Password flagged around cross-file dependency tracking and rollback logic point directly to where the next generation of tooling needs to improve.
Summary
1Password's engineering team ran AI agents autonomously against a large monolithic codebase and published one of the most detailed practitioner accounts of what that actually looks like in production at a security-critical company.
The agents handled cross-file dependency tracking, test suite maintenance, and rollback logic with meaningful velocity gains. But the team also documented specific classes of mistakes that required human supervisor intervention to catch before they became architectural problems, making clear that autonomous refactoring at this scale is not a lights-out operation.
Essentially: (1Password engineering) validated that agentic code rewriting works in production but only under active human oversight patterns.
- Agents struggled with cross-file dependency reasoning in ways that required supervisor escalation, not just automated test failures.
- Test suite maintenance surfaced as a distinct challenge, separate from the refactoring itself.
- Rollback logic had to be designed into the workflow deliberately, not treated as a fallback.
For teams considering agentic refactoring, 1Password's account shifts the conversation from "can it work" to "what oversight architecture do you need to make it work safely."
Potential risks and opportunities
Risks
- Teams at less security-mature companies may replicate 1Password's agentic approach without the oversight architecture, producing refactored codebases with subtle dependency bugs that pass automated tests but fail under adversarial conditions.
- If agent-introduced errors in cross-file dependencies reach production at a security-critical company, the blast radius is larger than equivalent human errors because the same mistake pattern may be replicated at scale across many files before detection.
- 1Password's public account could accelerate agentic refactoring adoption at companies whose test suites are too thin to surface the failure modes the 1Password team caught, leading to silent regressions over the next 6-12 months.
Opportunities
- Agentic coding platform vendors (Cursor, Cognition, GitHub Copilot Workspace) can use 1Password's failure taxonomy to ship targeted features around cross-file dependency validation and rollback integration.
- Security-focused code review vendors (Semgrep, Snyk, Socket) gain a clear pitch to compliance-heavy engineering teams: AI-generated refactor output needs a dedicated security review layer, creating a new product wedge.
- Consulting firms and internal platform teams at large enterprises can productize the human oversight patterns 1Password documented, offering a repeatable "agentic refactoring playbook" to risk-averse organizations that want the velocity gains without building the process from scratch.
What we don't know yet
- Which specific AI agent framework or tooling stack 1Password used is not disclosed in public reporting, limiting reproducibility for other teams.
- Whether the velocity gains held up across the full refactor or were concentrated in early, simpler modules is not quantified in the postmortem.
- How 1Password's security review and audit process was adapted for agent-generated code versus human-written code is not addressed.
Originally reported by 1Password Blog
Read the original article →Original headline: 1Password: What We Learned Using AI Agents to Refactor a Monolith — Real-World Account of Autonomous Code Rewrite at a Security-Critical Company