JetBrains Open-Sources Mellum 2 MoE Coding Model
Key insights
- Mellum 2 ships six open-weight variants under Apache 2.0, including Base, Instruct, and Thinking models for different inference needs.
- The Thinking variant scores 69.9% on LiveCodeBench v6 but trails Qwen3.5 4B on AIME despite having 12B total parameters.
- Architecture uses 64 experts with 8 activated per token and a 131,072-token context window, backed by a published arxiv technical report.
Why this matters
JetBrains releasing a full MoE model family under Apache 2.0 gives practitioners a production-viable 12B coding model with no licensing restrictions, raising the baseline for open-weight developer AI tooling. The simultaneous release of Thinking, Instruct, and SFT variants signals that inference-time reasoning is now a standard offering in the open-weight coding model space, not a premium differentiator. Benchmark data showing a 12B MoE model trailing Qwen3.5 4B on AIME is a concrete calibration point for teams weighing whether MoE active-parameter efficiency translates into task-specific capability parity against smaller dense models.
Summary
JetBrains published the Mellum 2 family on Hugging Face: a Mixture-of-Experts coding model at 12B total parameters with 2.5B active per token, released under Apache 2.0.
Six variants ship simultaneously: Base, Base-Pretrain, Instruct, Instruct-SFT, Thinking, and Thinking-SFT. The Thinking variant trains with RLVR (Reinforcement Learning with Verifiable Rewards) and outputs reasoning in blocks before its final answer, targeting complex debugging, agentic workflows, and multi-step planning.
Essentially: JetBrains extends its open-weight developer AI strategy with a full-spectrum release.
- Thinking variant scores 69.9% on LiveCodeBench v6 and 58.4% on AIME 2025+2026
- Despite 12B total parameters, the model trails Qwen3.5 4B on AIME (Qwen3.5 4B: 68.3)
- 64-expert MoE with 8 activated per token; context window is 131,072 tokens
A Mellum2 technical report on arxiv (2605.31268) accompanies the release, grounding this as both a product and a research contribution.
Potential risks and opportunities
Risks
- Benchmark gaps against Qwen3.5 4B on AIME despite Mellum2 having 12B total parameters could slow adoption by teams prioritizing math-heavy or reasoning-intensive agentic coding workflows
- Apache 2.0 terms allow competitors such as GitHub Copilot, Cursor, or Codeium to fine-tune and productize Mellum 2 directly against JetBrains IDE products without restriction
- With only 8 downloads for the Thinking variant at time of publication, community infrastructure for quantized builds and local deployment tooling may lag and limit near-term adoption
Opportunities
- Inference providers (Fireworks AI, Together AI, Lepton AI) can serve Mellum 2 at lower per-token cost than equivalent dense 12B models by routing only 2.5B active parameters per forward pass
- Coding agent builders targeting complex debugging or multi-step planning workflows gain a purpose-built Apache 2.0 reasoning model with a 131,072-token context window and verifiable-reward training
- JetBrains can use the arxiv technical report (2605.31268) to position Mellum 2 in academic and enterprise procurement conversations where published research credibility carries weight
What we don't know yet
- How the SFT variants (Instruct-SFT, Thinking-SFT) differ from their non-SFT counterparts in training data or RLVR stages is not specified in the model card
- Coding benchmark comparisons for the Instruct variant versus the Thinking variant are not published, making latency-versus-accuracy trade-off analysis difficult
- Whether JetBrains plans to integrate Mellum 2 into IDE products such as AI Assistant or Junie, and on what timeline, is not addressed in the release
Originally reported by huggingface.co
Read the original article →Original headline: JetBrains Ships Mellum 2: 12B MoE Coding Model With Thinking, Instruct, and SFT Variants Released Open-Weight