ByteDance Seed Marries World Models With Robotic Value Estimation
TL;DR
- World Value Model (WVM) achieves state-of-the-art Value-Order Correlation results on standard robotic benchmarks.
- ByteDance Seed introduces Suboptimal-Value-Bench: 800 suboptimal trajectories with human-labeled annotations across multiple robot embodiments.
- WVM improves robot manipulation performance in both simulated and real-world settings when integrated into policy learning.
The central bet in robotics research is that training on large, diverse datasets will eventually produce capable, general-purpose manipulation policies. But most real-world data is mixed-quality: expert demonstrations are expensive, and typical datasets contain failed or mediocre attempts alongside the good ones. Generalist value models are one proposed answer; they assess data quality by estimating task progression, helping downstream policy learning extract more from imperfect data. A new paper from ByteDance Seed argues that these value models have been built on the wrong architectural foundation.
The critique is specific. Most existing robotic value models use Vision-Language Model (VLM) backbones, which are pretrained on static or temporally sparse visual observations. The paper argues this leaves VLMs poorly equipped for the deep temporal understanding that accurate value estimation requires, specifically grounding current state using historical context and planning over future outcomes. World models, designed around temporal prediction and future planning, are proposed as the more natural foundation for learning generalizable value functions.
The resulting system, World Value Model (WVM), achieves state-of-the-art Value-Order Correlation results on standard benchmarks. The researchers also introduce Suboptimal-Value-Bench, a new multi-embodiment benchmark built around 800 suboptimal trajectories with high-fidelity, human-labeled frame annotations. Standard evaluation suites, the paper notes, contain only expert data; this benchmark is designed to test robustness on the mixed-quality data that real deployments involve. WVM maintains top performance there too, and when integrated into downstream policy learning, it improves manipulation performance across both simulated and real-world settings.
The honest caveat is that this is a single-team result from ByteDance Seed, and independent replication has not yet happened. The paper does not provide a detailed accounting of WVM's computational cost versus VLM-based alternatives, the specific world model architecture underneath, or how broadly the real-world gains hold across different robot hardware. Those gaps matter for any lab considering adoption.
If the approach holds up under scrutiny, the clearest beneficiaries are teams doing imitation learning from human demonstrations, which by nature include suboptimal examples. Better value estimation from the same data means more signal without the cost of collecting additional expert trajectories.
Originally reported by huggingface.co
Read the original article →Original headline: ByteDance Seed: World Value Models Combine World Models With Value Estimation for Robotic Manipulation, Introduce Suboptimal-Value-Bench