huggingface.co via Reddit

JetBrains Open-Sources Mellum 2 MoE Coding Model

By Alexis Dufresne Published June 1, 2026 at 14:08 UTC Updated June 1, 2026 at 14:10 UTC

coding tools open source ai-models coding

Key insights

Mellum 2 ships six open-weight variants under Apache 2.0, including Base, Instruct, and Thinking models for different inference needs.
The Thinking variant scores 69.9% on LiveCodeBench v6 but trails Qwen3.5 4B on AIME despite having 12B total parameters.
Architecture uses 64 experts with 8 activated per token and a 131,072-token context window, backed by a published arxiv technical report.

Why this matters

JetBrains releasing a full MoE model family under Apache 2.0 gives practitioners a production-viable 12B coding model with no licensing restrictions, raising the baseline for open-weight developer AI tooling. The simultaneous release of Thinking, Instruct, and SFT variants signals that inference-time reasoning is now a standard offering in the open-weight coding model space, not a premium differentiator. Benchmark data showing a 12B MoE model trailing Qwen3.5 4B on AIME is a concrete calibration point for teams weighing whether MoE active-parameter efficiency translates into task-specific capability parity against smaller dense models.

Summary

JetBrains published the Mellum 2 family on Hugging Face: a Mixture-of-Experts coding model at 12B total parameters with 2.5B active per token, released under Apache 2.0. Six variants ship simultaneously: Base, Base-Pretrain, Instruct, Instruct-SFT, Thinking, and Thinking-SFT. The Thinking variant trains with RLVR (Reinforcement Learning with Verifiable Rewards) and outputs reasoning in blocks before its final answer, targeting complex debugging, agentic workflows, and multi-step planning. Essentially: JetBrains extends its open-weight developer AI strategy with a full-spectrum release. - Thinking variant scores 69.9% on LiveCodeBench v6 and 58.4% on AIME 2025+2026 - Despite 12B total parameters, the model trails Qwen3.5 4B on AIME (Qwen3.5 4B: 68.3) - 64-expert MoE with 8 activated per token; context window is 131,072 tokens A Mellum2 technical report on arxiv (2605.31268) accompanies the release, grounding this as both a product and a research contribution.

Potential risks and opportunities

Risks

Benchmark gaps against Qwen3.5 4B on AIME despite Mellum2 having 12B total parameters could slow adoption by teams prioritizing math-heavy or reasoning-intensive agentic coding workflows
Apache 2.0 terms allow competitors such as GitHub Copilot, Cursor, or Codeium to fine-tune and productize Mellum 2 directly against JetBrains IDE products without restriction
With only 8 downloads for the Thinking variant at time of publication, community infrastructure for quantized builds and local deployment tooling may lag and limit near-term adoption

Opportunities

Inference providers (Fireworks AI, Together AI, Lepton AI) can serve Mellum 2 at lower per-token cost than equivalent dense 12B models by routing only 2.5B active parameters per forward pass
Coding agent builders targeting complex debugging or multi-step planning workflows gain a purpose-built Apache 2.0 reasoning model with a 131,072-token context window and verifiable-reward training
JetBrains can use the arxiv technical report (2605.31268) to position Mellum 2 in academic and enterprise procurement conversations where published research credibility carries weight

What we don't know yet

How the SFT variants (Instruct-SFT, Thinking-SFT) differ from their non-SFT counterparts in training data or RLVR stages is not specified in the model card
Coding benchmark comparisons for the Instruct variant versus the Thinking variant are not published, making latency-versus-accuracy trade-off analysis difficult
Whether JetBrains plans to integrate Mellum 2 into IDE products such as AI Assistant or Junie, and on what timeline, is not addressed in the release

Originally reported by huggingface.co

Read the original article →

Original headline: JetBrains Ships Mellum 2: 12B MoE Coding Model With Thinking, Instruct, and SFT Variants Released Open-Weight