Potato 2.0 pairs one annotator with an LLM to label agent data
TL;DR
- Potato 2.0 supports 39 annotation task types across text, audio, image, and video modalities in a single open source platform.
- The update adds annotation of agentic system outputs, either by reading common trace formats or by interacting with agents live.
- A new AI-in-the-loop workflow lets one human annotator work with an LLM via prompt refinement, uncertainty-driven selection, and progressive autonomy.
A quietly interesting demo paper at ACL 2026 is worth looking at if you build datasets for anything modern. Potato 2.0, from David Jurgens, Michael Chen, and Lina Iyer, updates their open source annotation platform for the world where the data researchers need is no longer just labeled sentences.
The abstract describes support for 39 different types of annotation tasks across text, audio, image, and video modalities, plus something newer: labeling the outputs of agentic systems, either by reading common trace formats or by interacting with agents live in what the paper describes as chatting, web-browsing, and coding settings. That second mode matters because most existing labeling tools were designed around static examples, and a lot of what people now want to evaluate is an agent's behavior mid-task.
The centerpiece is what the authors call an agentic AI-in-the-loop workflow, where a single human annotator collaborates with an LLM through iterative prompt refinement, uncertainty-driven instance selection, and progressive autonomy. The stated promise is efficient dataset creation without a large annotation team, which if it holds up would change the economics for small labs and grad students who cannot fund a crowdworker pool.
The honest caveat is that the abstract does not put numbers on any of it. There is no reported time saving, no annotator-LLM agreement figure, no downstream model quality benchmark, and no list of which LLMs the loop supports out of the box. Trace formats also evolve fast, so how well the schemas hold up as agent tooling changes is a fair thing to ask of the full paper.
For teams building agent evaluations or multimodal datasets, the appeal is a single open substrate that already covers most modalities and now speaks agent traces. Whether the collaborative loop actually cuts the human cost of a serious dataset is the thing to watch as people start using it.
Shared on Bluesky by 1 AI expert
-
Potato will be at #ACL2026. Come swing by our poster at Demo Session F on July 7 @ 9am to say hello! aclanthology.org/2026.acl-dem...
View on Bluesky →
Originally reported by aclanthology.org
Read the original article →Original headline: Potato 2.0: A Comprehensive Annotation Platform with AI-in-the-Loop Support