SARL adapts generalist robot policies via language prompts
TL;DR
- The paper introduces SARL, an RL method that optimizes over language prompts rather than directly over robot actions to adapt a pretrained generalist policy.
- The method treats the generalist policy as a controllable skill prior, composing existing skills to solve tasks beyond its zero-shot capabilities.
- The authors report SARL significantly outperforms existing deployment-adaptation approaches across real-world settings and simulated benchmarks for adapting VLA behavior.
The interesting move in a new arxiv paper from Jagdeep Singh Bhatia, Andrew Wagenmaker, William Chen and Sergey Levine is where they put the reinforcement learning. Standard practice for adapting a pretrained generalist robot policy is to run RL directly over the robot's actions. Their argument is that this only works when the base policy's action distribution is already close to a performant one, and that assumption breaks down for complex or long-horizon tasks that fall outside the pretraining distribution.
Their method, Semantic Action Reinforcement Learning, or SARL, instead runs RL over the language prompts you feed the generalist policy. The base model is treated as a controllable skill prior. You learn, through online interaction, which prompts elicit and compose the skills you actually need. The framing is that a sufficiently expressive generalist policy already contains a wide repertoire of behaviors, so the useful learning problem is figuring out how to talk to it, not how to teach it new motion from scratch.
If the claim holds up, it matters for anyone deploying a Vision Language Agent on real hardware. The authors argue that leveraging pretrained skills rather than learning new ones yields structured, semantically meaningful exploration and highly efficient online improvement, and that learning to modulate prompts through experience grounds them in induced real-world behaviors for robust task-solving. Across real-world settings and simulated benchmarks, the paper reports that SARL adapts VLA behavior to solve complex, long-horizon tasks and significantly outperforms existing approaches for improving robot behavior in deployment.
The honest caveat is that the abstract as posted does not name the specific benchmarks, the base VLA, the sample counts, or the comparison baselines, so 'significantly outperforms' is currently a headline claim without the receipts alongside it. The approach also cannot conjure skills the pretraining does not already contain, so its ceiling is set by whatever the underlying generalist can already do at zero shot.
The forward-looking part is the deployment story. If adapting a generalist robot policy to a new task on real hardware can be done by learning a better prompt, rather than by collecting a large action dataset and updating the whole model, the economics of putting VLAs into unfamiliar environments start to look different for teams that don't have a full robot lab.
Shared on Bluesky by 1 AI expert
Originally reported by arxiv.org
Read the original article →Original headline: Adapting Generalist Robot Policies with Semantic Reinforcement Learning