scmp.com via Reddit

DeepSeek V4 Pricing Drives 99% API Cuts in China AI

4 sources tracking this story
deepseek china ai china-ai ai-pricing deepseek

Key insights

  • Xiaomi's Sliding Window Attention optimization reduces KV Cache data transfer to 1/7 prior levels, making the 99% cut structurally viable rather than a loss-leader.
  • Decrypt quantifies the US-China frontier inference gap at 34x for comparable performance, up from the 15-30x range cited in earlier estimates.
  • Xiaomi MiMo head Fuli Luo confirmed the inference engine runs near full capacity and breaks even at new prices, directly rebutting a subsidy narrative.

Why this matters

DeepSeek V4's inference floor of $0.0036 per million cached input tokens is not a promotional discount but a structurally sustainable price point, enabled by infrastructure advances including hierarchical KV cache and dual attention architecture. Xiaomi executed a 99% MiMo-V2.5 cut despite a 43.1% Q1 profit decline, and Xiaomi's MiMo team confirms near break-even operation at new prices, eliminating the dumping thesis. Over half of MiMo's API usage now comes from overseas developers, meaning the Chinese pricing floor is already functioning as a global benchmark. MiniMax's pivot to hybrid subscription billing marks the first visible strategic fork: at least one major Chinese lab is treating race-to-zero token pricing as a ceiling, not a floor.

Summary

DeepSeek's V4 has set off a pricing cascade across China's AI market. Xiaomi cut API costs for MiMo-V2.5 by 99 percent, and the model shot to sixth place on OpenRouter. The scale of adoption is striking: MiMo-V2.5 processed 1.7 trillion tokens in one week, with growth exceeding 999 percent. MiniMax went the opposite direction, launching M3 with hybrid billing pairing token-based fees with subscriptions from US$7.24 to US$69.28 per month. Essentially: (Xiaomi, MiniMax) are taking opposite bets, one on volume, one on pricing diversification. - MiMo-V2.5 reached sixth place on OpenRouter after a 99% cut, processing 1.7 trillion tokens in one week. - MiniMax M3 tiers subscriptions from US$7.24 to US$69.28 monthly rather than racing to zero on per-token rates. - Cloud providers face friction too, as competitive pressure extends beyond developer APIs. China's AI market is splitting between volume plays and hybrid monetization experiments in response to DeepSeek V4.

Potential risks and opportunities

Risks

  • MiniMax's M3 hybrid model (US$7.24 to US$69.28/month) faces an adoption cliff if developers stay on pure usage-based alternatives as DeepSeek V4 pricing continues downward.
  • Chinese cloud providers face noted but unquantified friction as DeepSeek V4 pricing compresses inference margins across the domestic market.
  • Xiaomi's 99% price cut strategy may prove unsustainable if its 1.7 trillion tokens per week do not convert to durable paying customers beyond the OpenRouter leaderboard.

Opportunities

  • OpenRouter and similar API aggregator platforms benefit directly from Chinese model competition, as MiMo-V2.5 reaching sixth place shows aggressive pricing drives developer experimentation and platform traffic.
  • Developers and AI startups building on Chinese APIs gain access to dramatically lower inference costs, enabling product experiments previously uneconomical at higher token prices.
  • MiniMax's hybrid billing structure could serve as a monetization template for mid-tier AI providers globally who need predictable revenue without competing at the lowest per-token price point.

What we don't know yet

  • DeepSeek V4's specific per-token pricing is not disclosed in the article, making the exact cost floor driving the competitive cascade unquantifiable.
  • Whether Xiaomi's 99% price cut for MiMo-V2.5 is sustainable at 1.7-trillion-token weekly volumes, or is subsidized to drive OpenRouter rankings.
  • No specific cloud providers are named as affected, leaving unclear which players face the most acute margin compression from DeepSeek V4 pricing.

What others are reporting

Coverage cluster as of 24h after publish

  1. Decrypt Read →

    Technical breakdown of why cuts are economically sustainable — KV cache hierarchy and DeepSeek's dual attention — plus Fuli Luo on-record quote and 34x gap quantification.

    Our production inference engine is running at near full capacity, and we can still essentially break even. — Fuli Luo, Xiaomi MiMo head
  2. Caixin Global Read →

    Financial framing: documents Xiaomi's 43.1% Q1 profit decline alongside the cut, reports 30% paying-subscriber share, majority overseas, and 111% OpenRouter daily token surge.

    The aggressive discounts threaten to reignite a price war in China's hyper-competitive AI sector.
  3. Provides the SWA 1/7 KV Cache reduction detail, a full domestic competitor pricing table, and notes Xiaomi leadership previously criticized price wars while accepting free token giveaways.

    The package quota has skyrocketed by 5 to 8 times, and the lowest tier also has 500 million Tokens.