huggingface.co web signal June 25th 2026

DeepReinforce Ornith-1.0 Learns to Write Its Own RL Scaffolds

open source coding tools agents inference open-source-models coding-agents reinforcement-learning

TL;DR

Ornith-1.0's 35B MoE scores 64.2 on Terminal-Bench 2.1, outpacing Qwen3.5-397B's 53.5 with far fewer parameters.
The 397B flagship scores 82.4 on SWE-bench Verified, second only to Claude Opus 4.8's 87.6 among listed models.
All four variants from 9B to 397B are MIT licensed and immediately downloadable from Hugging Face.

Most RL-trained coding models learn to generate solutions inside a harness that human engineers designed in advance. DeepReinforce's Hugging Face release of Ornith-1.0 describes a different setup: during RL training, "the scaffold co-evolves with the model's policy," meaning the model jointly learns to propose its own orchestration framework and then generate solutions using it. The four-model family (9B Dense, 31B Dense, 35B MoE, and 397B MoE) ships MIT-licensed on Hugging Face.

The parameter efficiency numbers are the most concrete argument for the approach. The 35B MoE variant scores 64.2 on Terminal-Bench 2.1, against 53.5 for Qwen3.5-397B, a model with a substantially larger parameter count. On SWE-bench Verified, the 35B posts 75.6, competitive with models at much larger scale.

At the top of the family, according to MarkTechPost's coverage, the 397B MoE reaches 77.5 on Terminal-Bench 2.1, ahead of Claude Opus 4.7 (70.3) and Qwen3.5-397B (53.5). On SWE-bench Verified, it scores 82.4, second only to Claude Opus 4.8's 87.6 among the models listed.

The honest caveat is that all scores appear self-reported, and what the reporting doesn't give you is independent verification or any account of how the self-written scaffolds perform on task types outside the training distribution. Benchmark wins on SWE-bench are not the same as reliable behavior across the diversity of real production codebases.

At 21.2 GB in 4-bit quantization, the 35B variant fits on hardware many teams already have. The MIT license means these weights can go into commercial products and be fine-tuned for specialized domains without legal friction. For organizations currently paying for hosted API access to run coding agents, that combination is worth a serious look.

Originally reported by huggingface.co

Read the original article →

Original headline: DeepReinforce Releases Ornith-1.0 — MIT-Licensed Open-Source Coding Agent Family With Self-Written RL Scaffolds; 397B Hits 82.4 SWE-Bench Verified and 77.5 Terminal-Bench 2.1