huggingface.co via Reddit

Cohere Releases North Mini Code, Beats 120B Models

cohere coding tools open source agents coding-tools open-source

Key insights

  • North Mini Code scores 80.2% pass@10 on SWE-Bench Verified with only 3B active parameters inside a 30B MoE architecture.
  • RLVR post-training added 7.9 absolute percentage points on Terminal-Bench v2 and 3.0 points on SWE-Bench over the SFT baseline.
  • Cohere trained on 70,000+ verifiable tasks across ~5,000 unique repositories and released under Apache 2.0 with no commercial restrictions.

Why this matters

A coding agent beating 120B+ rivals at 3B active parameters shifts the cost calculus for teams choosing between self-hosting and API-based coding tools. The Apache 2.0 release lets commercial users deploy and fine-tune without license restrictions, putting direct pressure on proprietary coding assistants. The RLVR methodology, which added 7.9 absolute points on Terminal-Bench v2 from RL training alone, provides a concrete public reference for practitioners building their own coding agents.

Summary

Cohere released North Mini Code under Apache 2.0 on Hugging Face: a 30B MoE model with 3B active parameters scoring 80.2% pass@10 on SWE-Bench Verified and 61.0% pass@1 via the mini-SWE-Agent harness, outperforming open-source rivals up to 123B parameters. Training combined two-stage SFT over 70,000+ verifiable tasks across ~5,000 repositories with asynchronous RLVR post-training. The RL stage alone added 7.9 absolute percentage points on Terminal-Bench v2 and 3.0 on SWE-Bench. Internal pairwise evaluation gave the final model a 66.1% win rate over the SFT-only checkpoint. Essentially: (Cohere) ships an open-weight coding agent that outperforms models running at more than 40 times its active parameter count. - Scores 33.4 on Artificial Analysis' Coding Index, topping Devstral 2 (123B), Mistral Small 4 (119B-A6B), and Nemotron 3 Super (120B-A12B). - Multi-harness SFT exposure, with benchmark harness data at 6% of the training mix, yielded a 10% OpenCode evaluation gain while holding SWE-Agent performance. Apache 2.0 licensing removes the commercial barriers that most frontier-tier coding agents carry.

Potential risks and opportunities

Risks

  • SWE-Bench pass@10 metric may inflate perceived performance; enterprise deployments requiring pass@1 success rates could see substantially lower real-world results.
  • Mistral, Meta, and Google DeepMind can study the published CISPO training algorithm and RLVR methodology to close the efficiency gap on their own models within months.
  • Apache 2.0 weights are open to fine-tuning for malicious coding automation, creating reputational and regulatory exposure for Cohere if misuse surfaces publicly.

Opportunities

  • Cloud inference providers (Together AI, Fireworks AI, Modal) can host North Mini Code immediately, capturing developers migrating from pricier API-based coding assistants.
  • Enterprise AI tooling companies (Cursor, Codeium, Sourcegraph) can integrate the Apache 2.0 weights as an open-model backend, eliminating per-token API costs for coding features.
  • Cohere gains leverage in enterprise deals where data-residency requirements block cloud-hosted models, since self-hostable weights eliminate those compliance objections.

What we don't know yet

  • SWE-Bench Verified score is pass@10; head-to-head pass@1 against Devstral 2 and Mistral Small 4 is not reported, leaving single-attempt production comparisons uncertain.
  • Inference cost per task is undisclosed, so direct cost-efficiency comparison against API-based coding tools cannot be calculated from the published data.
  • Whether the ~5,000 repository training set overlaps with repositories used by competing models is unaddressed, which affects fair benchmark comparability.