ASPIRE agent writes robot code, hits 31% zero-shot vs 4%
TL;DR
- On LIBERO-Pro Long, ASPIRE hits 31% zero-shot success on unseen tasks versus 4% for prior methods using test-time reasoning and retries.
- ASPIRE autonomously writes and refines robot control programs in a code-as-policy paradigm, compounding fixes into a reusable skill library.
- Reported gains: 77% on LIBERO-Pro perturbation, 72% on Robosuite bimanual handover, 32% on BEHAVIOR-1K long-horizon household tasks.
A new arxiv preprint describes ASPIRE, short for 'Agentic Skill Programming through Iterative Robot Exploration,' a system that has robots write and refine their own control programs instead of learning tasks from human demonstrations. The headline number is on LIBERO-Pro Long, a long-horizon manipulation benchmark: ASPIRE reports 31% success on unseen tasks with no prior exposure, versus 4% for prior methods that were allowed to use test-time reasoning and retries.
The authors frame the approach as code-as-policy plus continual learning. Three pieces do the work: a closed-loop execution engine that exposes fine-grained multimodal traces so failures can be diagnosed, patched, and validated automatically; a continually expanding skill library that distills the validated fixes into reusable, transferable knowledge; and an evolutionary search that generates diverse task sequences and control programs, so the loop does not just refine a single trajectory into a corner. On top of the zero-shot claim, the paper reports gains of up to 77% on LIBERO-Pro manipulation under perturbation, 72% on Robosuite bimanual handover, and 32% on BEHAVIOR-1K long-horizon household tasks.
Why this is worth paying attention to: the dominant path to general robot skills has been demonstration-heavy imitation learning, and more recently vision-language-action models trained on giant teleop datasets. An agent that instead writes control code, watches it fail on the robot, patches it, and files the working version into a shared library is a very different bet on where robot competence comes from. If that library really does persist across simulation and real-world settings and across different embodiments, as the abstract claims, the marginal cost of teaching the next task drops.
The honest caveats are the ones you would expect. This is a preprint, not a peer-reviewed result, and benchmark leaps on LIBERO-Pro, Robosuite, and BEHAVIOR-1K are a meaningful signal but not the same as sustained reliability in a home or a warehouse. The paper's own framing is that ASPIRE surpasses prior methods, not that it solves the task. What the reporting does not give you is a picture of how big the accumulated skill library actually gets in practice, what breaks first when tasks drift further from the training distribution, or which underlying model is doing the code generation.
Still, the direction is the interesting part. If the results hold up, small labs and startups that cannot afford giant teleoperation datasets have a much cheaper path to a working skill set: point a code-writing loop at a simulator, let it fail its way to a library, and carry that library across to the real robot.
Originally reported by paper
Read the original article →Original headline: NVIDIA's ASPIRE Robots Write and Debug Their Own Code, Hit 31% Zero-Shot vs. 4% Baseline