Autodata: Meta FAIR Trains Agentic Data Scientist That Meta-Optimizes Synthetic Dataset Creation Across Code, Legal Reasoning, and Mathematics Tasks
Summary
Meta FAIR researchers introduce Autodata, a method that trains AI agents to act as data scientists who build and meta-optimize high-quality synthetic training and evaluation data, with a practical implementation called Agentic Self-Instruct. Experiments across computer science research, legal reasoning, and mathematical object tasks show Autodata outperforms classical synthetic dataset creation, and meta-optimizing the data scientist agent itself delivers an even larger performance uplift beyond standard agentic creation. The paper frames this as a mechanism to convert increased inference compute directly into higher model training quality, potentially reshaping how labs scale post-training data pipelines.
Originally reported by huggingface.co
Read the original article →Original headline: Autodata: Meta FAIR Trains Agentic Data Scientist That Meta-Optimizes Synthetic Dataset Creation Across Code, Legal Reasoning, and Mathematics Tasks