huggingface.co web signal

Moonshot AI releases Kimi K2.7 Code, an open 1T MoE coder

TL;DR

  • Moonshot AI released Kimi K2.7 Code on Hugging Face, a coding-focused MoE model with 1T total parameters and 32B activated per token.
  • Moonshot claims the model uses about 30% fewer thinking tokens than Kimi K2.6 while improving on it across the published coding and agentic benchmarks.
  • It ships under a Modified MIT License with a 256K context window, native INT4 quantization, and recommended deployment via vLLM, SGLang, or KTransformers.

Moonshot AI has put a new coding-focused model up on Hugging Face called Kimi K2.7 Code, and the spec sheet is the part worth dwelling on. It is a mixture-of-experts model with 1T total parameters and 32B activated per token, 384 experts with 8 selected at a time, a 256K context length, and is shipped under a Modified MIT License.

The headline pitch on the model card is efficiency rather than raw capability. Moonshot claims K2.7 Code reduces thinking-token usage by approximately 30% compared to Kimi K2.6, while improving on its predecessor across the benchmarks they publish. On their own Kimi Code Bench v2 the score goes from 50.9 to 62.0, Program Bench from 48.3 to 53.6, and MCP Mark Verified from 72.8 to 81.1.

Against the closed frontier the picture is more honest. By Moonshot's own table K2.7 Code still trails GPT-5.5 and Claude Opus 4.8 on most of the listed benchmarks. It edges past Claude Opus 4.8 on MCP Mark Verified (81.1 vs 76.4), but sits behind on Kimi Code Bench v2, Program Bench, MLS Bench Lite, and MCP Atlas. Take those comparisons as reported, not settled, since these are vendor-published numbers rather than independent evals.

The honest caveat is what the model card does not address. There is no detail on what 'Modified MIT License' actually modifies, which matters a lot for anyone planning to build a product on top. There is no disclosure of how the 30% thinking-token reduction was achieved, and no guidance on how much hardware you need to serve a 1T-parameter MoE even with the native INT4 quantization Moonshot lists. The recommended inference path runs through vLLM, SGLang, or KTransformers, which puts this squarely in the self-hosting-with-real-infra category.

If the open-weights tier keeps narrowing the gap to the closed frontier on coding tasks, the people who benefit first are teams that can't or won't route code through an external API: regulated shops, large infra groups with spare GPUs, and agent-framework builders who care more about a 30% token cut than another point on a leaderboard.

Shared on Bluesky by 2 AI experts