DeepReinforce Ornith-1.0 Learns to Write Its Own RL Scaffolds
TL;DR
- Ornith-1.0's 35B MoE scores 64.2 on Terminal-Bench 2.1, outpacing Qwen3.5-397B's 53.5 with far fewer parameters.
- The 397B flagship scores 82.4 on SWE-bench Verified, second only to Claude Opus 4.8's 87.6 among listed models.
- All four variants from 9B to 397B are MIT licensed and immediately downloadable from Hugging Face.
Most RL-trained coding models learn to generate solutions inside a harness that human engineers designed in advance. DeepReinforce's Hugging Face release of Ornith-1.0 describes a different setup: during RL training, "the scaffold co-evolves with the model's policy," meaning the model jointly learns to propose its own orchestration framework and then generate solutions using it. The four-model family (9B Dense, 31B Dense, 35B MoE, and 397B MoE) ships MIT-licensed on Hugging Face.
The parameter efficiency numbers are the most concrete argument for the approach. The 35B MoE variant scores 64.2 on Terminal-Bench 2.1, against 53.5 for Qwen3.5-397B, a model with a substantially larger parameter count. On SWE-bench Verified, the 35B posts 75.6, competitive with models at much larger scale.
At the top of the family, according to MarkTechPost's coverage, the 397B MoE reaches 77.5 on Terminal-Bench 2.1, ahead of Claude Opus 4.7 (70.3) and Qwen3.5-397B (53.5). On SWE-bench Verified, it scores 82.4, second only to Claude Opus 4.8's 87.6 among the models listed.
The honest caveat is that all scores appear self-reported, and what the reporting doesn't give you is independent verification or any account of how the self-written scaffolds perform on task types outside the training distribution. Benchmark wins on SWE-bench are not the same as reliable behavior across the diversity of real production codebases.
At 21.2 GB in 4-bit quantization, the 35B variant fits on hardware many teams already have. The MIT license means these weights can go into commercial products and be fine-tuned for specialized domains without legal friction. For organizations currently paying for hosted API access to run coding agents, that combination is worth a serious look.
Originally reported by huggingface.co
Read the original article →Original headline: DeepReinforce Releases Ornith-1.0 — MIT-Licensed Open-Source Coding Agent Family With Self-Written RL Scaffolds; 397B Hits 82.4 SWE-Bench Verified and 77.5 Terminal-Bench 2.1