We can think of this as the robot analogue of RL for thinking, optimizing for good "thoughts" through trial-and-error. The surprising thing is that it's so fast, learning in under a hundred real-world trials. Website: https://t.co/APpoOIsXB2 Paper: https://t.co/50dkjqN8y3
AI Weekly's analysis
→
- The paper introduces SARL, an RL method that optimizes over language prompts rather than directly over robot actions to adapt a pretrained generalist policy.
- The method treats the generalist policy as a controllable skill prior, composing existing skills to solve tasks beyond its zero-shot capabilities.
- The authors report SARL significantly outperforms existing deployment-adaptation approaches across real-world settings and simulated benchmarks for adapting VLA behavior.
Read full analysis →