Agent-as-a-Router routes coding tasks across 8 frontier LLMs
TL;DR
- ACRouter frames coding-model selection as a Context-Action-Feedback loop with Orchestrator, Verifier, and Memory components rather than a static classifier.
- Adding task-level performance statistics to a baseline router produced a 15.3% relative gain over a heuristic router, the authors report.
- On the authors' CodeRouterBench, covering roughly 10,000 task instances and 8 frontier LLMs, ACRouter posted the lowest cumulative regret in-distribution.
A new arxiv preprint argues that the question of which model should write your code shouldn't be answered once and for all by a static classifier. The paper, titled Agent-as-a-Router and led by Pengfei Zhou with a long list of collaborators, frames model routing for coding tasks as a feedback loop rather than a one-shot lookup.
The setup is straightforward. You have a pool of frontier coding models, each with different strengths, costs, and failure modes. A router has to pick one per task. The authors argue existing routers operate on incomplete information, and propose ACRouter, an agentic system that runs a Context, Action, Feedback loop with an Orchestrator, a Verifier, and a Memory module. The headline number from their ablation is that simply adding task-level performance statistics to a baseline router produced what they describe as a 15.3% relative gain, surpassing a heuristic router.
To evaluate it they built CodeRouterBench, a benchmark of roughly 10,000 task instances scored against 8 frontier LLMs. On that bench, ACRouter posted the lowest cumulative regret on in-distribution tasks and, the authors report, generalized to out-of-distribution programming scenarios as well.
The honest caveat is that this is the authors' own benchmark and their own router, and the paper is described as a living technical report with continuous updates, so take the specifics as reported rather than settled. What the report doesn't give you is a head-to-head cost comparison against the common baseline most teams quietly run today, which is just defaulting to the strongest available model, nor a clear picture of the latency overhead the orchestration layer adds.
If routing of this kind actually pans out, the people who benefit are the platforms running coding agents at volume, where shaving even a small fraction off the inference bill by sending easier tasks to cheaper models matters in aggregate. The direction worth watching is whether feedback-driven routers start showing up inside IDE-integrated agents, rather than staying a separate research artifact.
Shared on Bluesky by 2 AI experts
Originally reported by arxiv.org
Read the original article →Original headline: Agent-as-a-Router: Agentic Model Routing for Coding Tasks