anthropic.com web signal

Anthropic: DeepSeek, Moonshot, MiniMax distilled Claude at scale

TL;DR

  • Anthropic accuses DeepSeek, Moonshot, and MiniMax of distilling Claude through roughly 24,000 fraudulent accounts and more than 16 million exchanges.
  • MiniMax drove about 13 million of those exchanges, focused on agentic coding and tool orchestration, and reportedly pivoted within 24 hours of new Claude releases.
  • Anthropic does not sell Claude commercially in China, so by its account every one of the 24,000 accounts violated terms of service.

A report Anthropic published this winter is one of the more concrete looks we have at how the model-copying problem actually works in practice. The company says three Chinese AI labs, DeepSeek, Moonshot, and MiniMax, ran what it calls industrial-scale distillation campaigns, generating more than 16 million exchanges with Claude across roughly 24,000 fraudulent accounts. Anthropic laid the details out on its own news site.

Distillation itself is not the problem. Anthropic describes it as 'a widely used and legitimate training method' where a weaker model learns from a stronger one's outputs. The accusation here is that competitors used it 'in a fraction of the time, and at a fraction of the cost' to acquire capabilities the originator paid to develop. By Anthropic's breakdown, MiniMax was by far the largest at roughly 13 million exchanges, centered on agentic coding and tool orchestration. Moonshot ran about 3.4 million, focused on agentic reasoning and computer-use agent development. DeepSeek's was the smallest at around 150,000, leaning on reasoning, rubric-based grading, and chain-of-thought training data, including what the report calls 'censorship-safe alternatives to politically sensitive queries.' MiniMax reportedly pivoted within 24 hours whenever Anthropic shipped new models.

Why this matters beyond a corporate accusation: Anthropic does not provide commercial access to Claude in China, so by the company's account every one of those 24,000 accounts violated its terms of service. The detection methods it describes (traffic classifiers looking for high-repetition prompts and narrow capability targeting, plus behavioral fingerprinting that ties accounts together through shared payment methods, IP ranges, and request timing) are the kind of controls that will tighten across the API tier most useful to small developers, especially the educational and research accounts Anthropic flags as common abuse vectors.

The honest caveats are that this is Anthropic's account, single-sourced, and the accused labs have not had their say in this report. What the post does not give you is how confident the attribution to specific labs is beyond behavioral signatures, what enforcement beyond account bans is planned, or how much of those competitors' shipped models can actually be traced back to distilled Claude outputs. Take the specifics as reported, not settled.

The forward-looking part is more interesting than the finger-pointing. If behavioral fingerprinting at this scale works, the playbook spreads industry-wide, and the cost of running the 'hydra cluster' proxy networks Anthropic describes goes up for everyone trying it. That probably narrows the gap pure distillation can close, and pushes the actual race back toward the people doing the original training work.

Shared on Bluesky by 2 AI experts