huggingface.co via Reddit

JetBrains Open-Sources Mellum 2 MoE Coding Model

coding tools open source ai-models coding

Key insights

  • Mellum 2 ships six open-weight variants under Apache 2.0, including Base, Instruct, and Thinking models for different inference needs.
  • The Thinking variant scores 69.9% on LiveCodeBench v6 but trails Qwen3.5 4B on AIME despite having 12B total parameters.
  • Architecture uses 64 experts with 8 activated per token and a 131,072-token context window, backed by a published arxiv technical report.

Why this matters

JetBrains releasing a full MoE model family under Apache 2.0 gives practitioners a production-viable 12B coding model with no licensing restrictions, raising the baseline for open-weight developer AI tooling. The simultaneous release of Thinking, Instruct, and SFT variants signals that inference-time reasoning is now a standard offering in the open-weight coding model space, not a premium differentiator. Benchmark data showing a 12B MoE model trailing Qwen3.5 4B on AIME is a concrete calibration point for teams weighing whether MoE active-parameter efficiency translates into task-specific capability parity against smaller dense models.

Summary

JetBrains published the Mellum 2 family on Hugging Face: a Mixture-of-Experts coding model at 12B total parameters with 2.5B active per token, released under Apache 2.0. Six variants ship simultaneously: Base, Base-Pretrain, Instruct, Instruct-SFT, Thinking, and Thinking-SFT. The Thinking variant trains with RLVR (Reinforcement Learning with Verifiable Rewards) and outputs reasoning in blocks before its final answer, targeting complex debugging, agentic workflows, and multi-step planning. Essentially: JetBrains extends its open-weight developer AI strategy with a full-spectrum release. - Thinking variant scores 69.9% on LiveCodeBench v6 and 58.4% on AIME 2025+2026 - Despite 12B total parameters, the model trails Qwen3.5 4B on AIME (Qwen3.5 4B: 68.3) - 64-expert MoE with 8 activated per token; context window is 131,072 tokens A Mellum2 technical report on arxiv (2605.31268) accompanies the release, grounding this as both a product and a research contribution.

Potential risks and opportunities

Risks

  • Benchmark gaps against Qwen3.5 4B on AIME despite Mellum2 having 12B total parameters could slow adoption by teams prioritizing math-heavy or reasoning-intensive agentic coding workflows
  • Apache 2.0 terms allow competitors such as GitHub Copilot, Cursor, or Codeium to fine-tune and productize Mellum 2 directly against JetBrains IDE products without restriction
  • With only 8 downloads for the Thinking variant at time of publication, community infrastructure for quantized builds and local deployment tooling may lag and limit near-term adoption

Opportunities

  • Inference providers (Fireworks AI, Together AI, Lepton AI) can serve Mellum 2 at lower per-token cost than equivalent dense 12B models by routing only 2.5B active parameters per forward pass
  • Coding agent builders targeting complex debugging or multi-step planning workflows gain a purpose-built Apache 2.0 reasoning model with a 131,072-token context window and verifiable-reward training
  • JetBrains can use the arxiv technical report (2605.31268) to position Mellum 2 in academic and enterprise procurement conversations where published research credibility carries weight

What we don't know yet

  • How the SFT variants (Instruct-SFT, Thinking-SFT) differ from their non-SFT counterparts in training data or RLVR stages is not specified in the model card
  • Coding benchmark comparisons for the Instruct variant versus the Thinking variant are not published, making latency-versus-accuracy trade-off analysis difficult
  • Whether JetBrains plans to integrate Mellum 2 into IDE products such as AI Assistant or Junie, and on what timeline, is not addressed in the release