venturebeat.com web signal

Arbor AI Optimization Framework Outperforms Claude Code and Codex by 2.5× on Same Compute Budget via Hypothesis-Tree Architecture

anthropic coding tools open source agents ai-research coding-agents

Summary

Researchers from Renmin University of China and Microsoft Research published Arbor, an autonomous agent optimization framework that treats each improvement hypothesis as an isolated git-worktree experiment so successful changes are cleanly merged and failed ones pruned without entangling results. In benchmark comparisons published June 19, Arbor achieved 2.5× the average performance gain of Claude Code and Codex on the same compute budget, raising held-out BrowseComp accuracy from a 45.3% baseline to 67.7% while competing systems stalled at 50–53%. The approach generalizes across model training, harness engineering, and data synthesis tasks using multiple LLM backends including Claude Opus 4.6 and GPT-5.5.