huggingface.co via Reddit

Cohere Releases North Mini Code, Beats 120B Models

By Alexis Dufresne Published June 9, 2026 at 17:03 UTC Updated June 9, 2026 at 17:05 UTC

cohere coding tools open source agents coding-tools open-source

Key insights

North Mini Code scores 80.2% pass@10 on SWE-Bench Verified with only 3B active parameters inside a 30B MoE architecture.
RLVR post-training added 7.9 absolute percentage points on Terminal-Bench v2 and 3.0 points on SWE-Bench over the SFT baseline.
Cohere trained on 70,000+ verifiable tasks across ~5,000 unique repositories and released under Apache 2.0 with no commercial restrictions.

Why this matters

A coding agent beating 120B+ rivals at 3B active parameters shifts the cost calculus for teams choosing between self-hosting and API-based coding tools. The Apache 2.0 release lets commercial users deploy and fine-tune without license restrictions, putting direct pressure on proprietary coding assistants. The RLVR methodology, which added 7.9 absolute points on Terminal-Bench v2 from RL training alone, provides a concrete public reference for practitioners building their own coding agents.

Summary

Cohere released North Mini Code under Apache 2.0 on Hugging Face: a 30B MoE model with 3B active parameters scoring 80.2% pass@10 on SWE-Bench Verified and 61.0% pass@1 via the mini-SWE-Agent harness, outperforming open-source rivals up to 123B parameters. Training combined two-stage SFT over 70,000+ verifiable tasks across ~5,000 repositories with asynchronous RLVR post-training. The RL stage alone added 7.9 absolute percentage points on Terminal-Bench v2 and 3.0 on SWE-Bench. Internal pairwise evaluation gave the final model a 66.1% win rate over the SFT-only checkpoint. Essentially: (Cohere) ships an open-weight coding agent that outperforms models running at more than 40 times its active parameter count. - Scores 33.4 on Artificial Analysis' Coding Index, topping Devstral 2 (123B), Mistral Small 4 (119B-A6B), and Nemotron 3 Super (120B-A12B). - Multi-harness SFT exposure, with benchmark harness data at 6% of the training mix, yielded a 10% OpenCode evaluation gain while holding SWE-Agent performance. Apache 2.0 licensing removes the commercial barriers that most frontier-tier coding agents carry.

Potential risks and opportunities

Risks

SWE-Bench pass@10 metric may inflate perceived performance; enterprise deployments requiring pass@1 success rates could see substantially lower real-world results.
Mistral, Meta, and Google DeepMind can study the published CISPO training algorithm and RLVR methodology to close the efficiency gap on their own models within months.
Apache 2.0 weights are open to fine-tuning for malicious coding automation, creating reputational and regulatory exposure for Cohere if misuse surfaces publicly.

Opportunities

Cloud inference providers (Together AI, Fireworks AI, Modal) can host North Mini Code immediately, capturing developers migrating from pricier API-based coding assistants.
Enterprise AI tooling companies (Cursor, Codeium, Sourcegraph) can integrate the Apache 2.0 weights as an open-model backend, eliminating per-token API costs for coding features.
Cohere gains leverage in enterprise deals where data-residency requirements block cloud-hosted models, since self-hostable weights eliminate those compliance objections.

What we don't know yet

SWE-Bench Verified score is pass@10; head-to-head pass@1 against Devstral 2 and Mistral Small 4 is not reported, leaving single-attempt production comparisons uncertain.
Inference cost per task is undisclosed, so direct cost-efficiency comparison against API-based coding tools cannot be calculated from the published data.
Whether the ~5,000 repository training set overlaps with repositories used by competing models is unaddressed, which affects fair benchmark comparability.

Originally reported by huggingface.co

Read the original article →

Original headline: Cohere Releases North Mini Code 1.0 Open-Weight — 30B MoE, 3B Active, Scores 80.2% SWE-Bench Verified and Outperforms Models 3–4× Its Size