Cursor Composer 2.5 rivals GPT-5.5 at 10% the cost
Key insights
- Cursor allocated 85% of total compute to post-training rather than the base Kimi K2.5 model, making RL data strategy the core differentiator.
- Composer 2.5 benchmarks at 79.8% SWE-Bench Multilingual, matching Opus 4.7 and GPT-5.5 at roughly one-tenth their token cost.
- The model is built on Moonshot AI's open-source Kimi K2.5 checkpoint, showing open-source bases can anchor competitive commercial products.
Why this matters
Cursor has demonstrated a reproducible recipe for matching closed frontier models on coding tasks by investing heavily in synthetic RL post-training on an open-source base, which means the cost advantage of proprietary models like GPT-5.5 and Opus 4.7 is no longer tied to architecture or scale alone. For AI founders and infrastructure teams, the $0.50/$2.50 per million token pricing resets expectations for what agentic coding pipelines should cost at production scale, putting pressure on OpenAI and Anthropic to justify premium pricing on coding-specific workloads. The 85/15 compute split between post-training and base model acquisition signals a broader strategic shift: open-source checkpoints are becoming credible starting points for commercial-grade vertical AI products, which changes the build-vs-buy calculus for any team considering fine-tuning versus API access.
Summary
Cursor's new Composer 2.5 model matches frontier-tier coding performance at a fraction of the price, landing as the new default in the Cursor editor as of May 18.
Built on Moonshot AI's open-source Kimi K2.5 checkpoint, Composer 2.5 is less a base model story and more a post-training story: Cursor directed 85% of total compute toward its own reinforcement learning pipeline, generating 25 times more synthetic coding tasks than its predecessor. The result is a model that scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1, putting it alongside Anthropic's Opus 4.7 and OpenAI's GPT-5.5 on agentic coding tasks.
Essentially: (Cursor, Moonshot AI) have demonstrated that aggressive post-training on an open-source base can close the gap with closed frontier models at dramatically lower inference cost.
- Pricing sits at $0.50 input / $2.50 output per million tokens on the standard tier, roughly one-tenth the cost of the models it benchmarks against.
- Composer 2.5 replaces Composer 2 as the default, with double usage credits offered during the first week of rollout.
- The 25x synthetic RL data multiplier points to compute allocation strategy, not just model scale, as the key differentiator.
The broader implication is that the cost curve for frontier-competitive coding agents is compressing faster than the closed-model labs have priced for.
Potential risks and opportunities
Risks
- If Moonshot AI modifies Kimi K2.5 licensing terms or restricts commercial use post-traction, Cursor's default model stack faces a forced migration mid-product cycle.
- Anthropic and OpenAI could respond with targeted price cuts or usage credit programs aimed at retaining coding-heavy enterprise customers within 60-90 days.
- Benchmark-to-production gaps on multilingual or large-codebase tasks could erode user trust quickly given Composer 2.5 is now the default for all Cursor users, not an opt-in experiment.
Opportunities
- Other developer tooling companies (Replit, Sourcegraph, JetBrains) can now credibly license or fine-tune open-source checkpoints with heavy RL post-training to build competitive coding agents without frontier API dependency.
- RL data synthesis and pipeline vendors serving AI labs gain a high-visibility proof point for the ROI of synthetic training data at scale, potentially accelerating enterprise deals.
- Moonshot AI's Kimi K2.5 gains significant commercial credibility as a base model for downstream products, strengthening its positioning against Meta's Llama series in the open-weight coding model market.
What we don't know yet
- Whether Cursor's synthetic RL pipeline and training data methodology will be published or remain proprietary, limiting reproducibility for third parties.
- How Moonshot AI's Kimi K2.5 licensing terms govern Cursor's commercial deployment and whether revenue-sharing or usage restrictions apply at scale.
- Whether the SWE-Bench Multilingual and CursorBench v3.1 scores hold across non-English codebases and real-world long-horizon tasks beyond benchmark conditions.
Originally reported by Cursor
Read the original article →Original headline: Cursor Ships Composer 2.5: Agentic Coding Model Matches Opus 4.7 and GPT-5.5 at One-Tenth the Cost, Built on Kimi K2.5 With 25× More Synthetic RL Training Data