Paper: AI agents should help build preferences, not just elicit
TL;DR
- New arxiv paper argues AI agents should help non-expert users construct preferences, not assume users already know what they want.
- The authors introduce CoShop, an interactive benchmark where no tested agent exceeded 56% accuracy after five turns of dialogue.
- Failures came from agents' limited knowledge expansion, not from difficulty finding items once preferences were specified.
A new arxiv paper from Irena Saracay, Ludwig Schmidt, and Carlos Guestrin pushes back on an assumption baked into a lot of today's AI agent research, which is that the user showing up already knows what they want. The paper argues that in many realistic settings the user does not have a well-formed preference to elicit, and that the agent's job is to help them construct one.
The framing borrows from Information Economics, specifically the Search-Experience-Credence framework, and the authors formalize the idea as CoPref, a preference-construction model. They also introduce CoShop, an interactive benchmark that puts agents into recommendation-style conversations where the user is a non-expert. The reported result on that benchmark is striking in a quiet way: no agent tested exceeded 56% accuracy despite being given five turns of dialogue to help the user get to a decision.
Why this matters if you build product on top of these models: a lot of consumer-facing agent work today is scaffolded around the idea that the user can express what they want in natural language, and the model's job is to search over items to match. The paper's read is that this misses the harder half of the problem for regular users, which is that they do not know the vocabulary or the options space well enough to give a good spec. The authors report that the failures they observed came from limited knowledge expansion, meaning the agent was not helping the user learn, rather than the agent being bad at finding items once told what to find.
The honest caveat is that this is a single paper with a single benchmark and a specific class of tasks, so treat the 56% ceiling as a signal from one experiment rather than a settled result about all agents. What the abstract as fetched does not fill in is which specific models were tested, how CoShop's ground truth was defined, or what preference-construction looks like once it leaves the benchmark and lands in a shipped product.
For anyone building consumer AI, the useful move is probably to stop treating preference elicitation as a solved intake problem and to start treating the agent as a tutor for the domain the user is actually trying to make a decision in.
Shared on Bluesky by 2 AI experts
Originally reported by arxiv.org
Read the original article →Original headline: Beyond expert users: agents should help users construct preferences, not just elicit them