huggingface.co web signal June 24th 2026

Autodata: Meta FAIR Trains Agentic Data Scientist That Meta-Optimizes Synthetic Dataset Creation Across Code, Legal Reasoning, and Mathematics Tasks

meta agents synthetic data synthetic-data agentic-ai meta-learning

Summary

Meta FAIR researchers introduce Autodata, a method that trains AI agents to act as data scientists who build and meta-optimize high-quality synthetic training and evaluation data, with a practical implementation called Agentic Self-Instruct. Experiments across computer science research, legal reasoning, and mathematical object tasks show Autodata outperforms classical synthetic dataset creation, and meta-optimizing the data scientist agent itself delivers an even larger performance uplift beyond standard agentic creation. The paper frames this as a mechanism to convert increased inference compute directly into higher model training quality, potentially reshaping how labs scale post-training data pipelines.

Originally reported by huggingface.co

Read the original article →

Original headline: Autodata: Meta FAIR Trains Agentic Data Scientist That Meta-Optimizes Synthetic Dataset Creation Across Code, Legal Reasoning, and Mathematics Tasks