Agent CI Bench Finds Most Frontier Agents Leak Private Data
TL;DR
- 12 of 15 frontier computer-use agents leaked private data in over 50% of evaluated privacy scenarios.
- Average leakage rate across agents was 67.9%, nearly matching the average task-completion rate of 68.8%.
- Three prompt-level mitigations each reduced engagement-conditioned leakage 33 to 36 percentage points while also improving task utility.
When a computer-use agent helps you draft a work status update, it can also see your calendar, your inbox, and your health notes. A paper shared on Hugging Face this week tests whether frontier agents know the difference between what they are allowed to see and what they are allowed to share -- and the results are not reassuring.
The benchmark, called Agent CI Bench, evaluates 15 frontier computer-use agents against scenarios designed around the concept of contextual integrity: the principle that information should flow appropriately based on the context in which it was originally shared. The researchers built 117 scenarios across a controlled six-app workspace covering messenger, calendar, maps, to-do lists, a code editor, and shopping. Each scenario specifies what the agent's output must include for the task to count as completed, and what it must not include to avoid a privacy violation. Three distinct failure modes are tested: agents pulling prohibited items that happen to be visually adjacent to the task target in the UI, agents dumping all available personal state in response to underspecified prompts, and agents sending appropriate content to an inappropriate recipient.
The headline results: 12 of 15 agents leaked private data in more than 50% of scenarios, and the average leakage rate across all agents was 67.9%, nearly matching the average task-completion rate of 68.8%. The correlation between task completion and disclosure safety was weak -- Pearson r = 0.49, p = 0.06 -- which means selecting agents on capability benchmarks gives almost no information about their privacy behavior. Among high-capability agents the spread was enormous: Claude-Opus-4.7 showed an engagement-conditioned leakage rate of 14.0% with 87.4% utility, while Gemini-3.1-Pro achieved 96.6% utility but 98.3% leakage.
The more actionable finding may be in the mitigation section. Three lightweight prompt-level interventions -- a restrictive reading instruction, a four-point contextual integrity rubric, and a recipient-typing step that requires the agent to name the recipient and applicable norms before generating output -- each reduced engagement-conditioned leakage by 33 to 36 percentage points while simultaneously raising utility by 15.7 to 23.1 points. Privacy and task performance were not in tension here, and simple system-prompt changes moved both metrics in the right direction for the tested models.
The honest caveat is that OpenApps is a controlled environment, not the sprawling software landscape where most real agents operate. The paper notes that absolute leakage rates should be read as relative orderings rather than deployment predictions, and the end-to-end UI evaluation covered only two agents on a 50-scenario subset with correspondingly wider uncertainty. What the paper also does not give you is evidence that these mitigations generalize beyond the three models they were tested on. The benchmark and its MCTS-based generation harness are being released openly, which gives teams a concrete starting point for contextual integrity auditing before shipping.
Originally reported by huggingface.co
Read the original article →Original headline: Agent CI Bench: 12 of 15 Frontier Computer-Use Agents Leak Private Data in Over 50% of Privacy Scenarios — Average Leakage Rate 67.9%