paper web signal

DART adapts VLA robots to new cameras and arms with one demo

TL;DR

  • DART adapts Vision-Language-Action robot models to camera-pose changes or a different robot body using a single demonstration instead of many.
  • The method treats domain shift as weight-vector arithmetic, using subspace alignment across singular components to filter noise before adding domain-specific information.
  • In both simulated and real-world tests the authors report DART outperforms existing VLA adaptation methods in one-shot scenarios, including Panda-to-UR5e embodiment shifts.

A quiet paper on arXiv this week reframes what 'deploying a robot policy' actually costs. In a new preprint accepted to ECCV 2026, Taewook Kang and coauthors describe DART, short for Domain ARiThmetic, a method for adapting Vision-Language-Action models to a new camera pose or a different robot arm using just a single demonstration.

The framing matters because the current answer to 'my camera moved' or 'we're switching from a Panda to a UR5e' is to collect a fresh batch of demonstrations per task and retrain. DART, per the authors, treats domain shift as arithmetic on the model's weights: isolate the domain-specific direction, add it in, and skip the multi-demo retraining. To keep that addition clean, the paper performs subspace alignment between singular components in the weight vectors, which the authors describe as a way to filter out noisy components before combining.

The claim is that in both simulated and real-world experiments the method outperforms existing VLA adaptation methods in one-shot scenarios across visual and embodiment shifts. Take the specifics as reported: the exact success rates and task suites are not in the abstract itself, and 'one-shot' in a lab is not the same thing as 'one-shot' on a factory floor where lighting, backgrounds and gripper wear all move together. What the abstract does not give you is the failure mode when the target embodiment is genuinely different in kinematics, or how the method holds up outside the visual and embodiment shifts the authors chose to test.

If it generalizes, the interesting audience is the middle of the market. The teams that can already afford to burn weeks collecting demonstrations per cell are fine. The teams that get unblocked are integrators trying to move a policy from one customer's Panda to another's UR5e without a fresh data campaign, and academic labs whose robot does not match the one a foundation policy was trained on. That is where a working analogy-based transfer would actually change the calendar.