scmp.com web signal June 30th 2026

Meituan Trains 1.6T LongCat-2.0 End-to-End on Chinese Chips

6 sources tracking this story

china ai open source chips inference ai-business

TL;DR

Meituan deployed LongCat-2.0 anonymously as 'Owl Alpha' on OpenRouter, reaching top-3 rankings before revealing the model identity or training origin.
The full pipeline ran on Huawei's HCCL, the chip-to-chip coordination library that mirrors NVIDIA's NCCL, across roughly 50,000 Atlas-950 accelerators.
LongCat-2.0 scores 59.5 on SWE-bench Pro versus GPT-5.5's 58.6 on coding, but trails Claude Opus 4.8 on broader agent benchmarks including FORTE and BrowseComp.

Editor's note

Meituan's LongCat-2.0 is the first confirmed Chinese frontier model to complete pre-training end-to-end on domestic ASIC hardware, a threshold no other Chinese lab has crossed — DeepSeek V4-Pro used domestic chips only for inference. The stealth deployment as 'Owl Alpha' on OpenRouter before any public announcement means real developer workloads validated the model before Meituan disclosed the training stack, adding independent weight to the capability claim. Bernstein data showing Nvidia still held roughly 40% of China's AI chip market in 2025 makes the verified Atlas-950 training run a direct market-share challenge. The constraint on China's domestic chip trajectory has shifted from whether the silicon can sustain a frontier training run to whether it does so reliably at scale — Meituan's announcement moves that question toward settled.

A Chinese food delivery company just trained a 1.6-trillion-parameter language model without touching an Nvidia chip. The South China Morning Post reported on June 30 that Meituan released LongCat-2.0, a model with a 1-million-token context window, trained end-to-end on large clusters of domestic AI ASIC superpods. The bit worth dwelling on is 'end-to-end', meaning both pre-training and inference ran on home-grown silicon.

That distinction matters because the previous Chinese frontier benchmark, DeepSeek's V4-pro from April 2026, used domestic chips only for the lighter inference step. Pre-training is the computationally intensive part, the part where you stream huge volumes of tokens through tens of thousands of accelerators and need the cluster interconnect not to drop. To make that work without Nvidia's NCCL, Meituan integrated Huawei's Collective Communication Library, the chip-to-chip plumbing that keeps a training run stable at scale.

According to SCMP, LongCat-2.0 lands on par with DeepSeek V4-pro on benchmarks. Take that as Meituan's claim rather than settled fact, since there is no independent leaderboard run yet and the article does not break out which evals it is referring to. The honest caveat is that 'on par' headlines have a long history of softening under closer inspection, and a vendor announcement is not a third-party eval.

What the reporting does not give you is the harder economic question: how wall-clock time, accelerator count and energy bill compared to an equivalent Nvidia run. SCMP cites tens of thousands of ASICs in the cluster but does not put a number on cost or duration, and it does not name which domestic chip line did the work, which is the detail that would tell you whether this generalises to other labs.

The forward-looking read is straightforward. If a non-AI-native company like Meituan can complete a trillion-parameter training run on domestic silicon, the export-controls story changes shape. The question stops being whether China can serve frontier models locally, and starts being how quickly Huawei's training stack becomes the default for everyone else inside the firewall.

What others are reporting

Coverage cluster as of 24h after publish

SiliconAngle Read →

Cites Bernstein research on Nvidia's ~40% China chip market share and frames the Atlas-950 training run as a concrete near-term threat to that position.

The model's training origin means it will run reliably and likely perform well on domestically available chips in China, while reducing dependence on Nvidia-specific software.
Geopolitechs Read →

Reveals the 'Owl Alpha' stealth strategy in detail: Meituan deployed anonymously on OpenRouter, accumulated genuine developer adoption, then disclosed identity — framing it as deliberate invisible capability-building from an unexpected actor.

A company better known outside China for food delivery...is now saying it doesn't need Nvidia GPUs to run a trillion-parameter model.
BeInCrypto Read →

Most technically specific on the hardware stack: explains HCCL as the domestic analogue to NVIDIA's NCCL and puts the accelerator count at 50,000 for the training cluster.
CryptoBriefing Read →

Leads with the export-controls policy frame: the successful training run is presented as evidence that US chip restrictions carry limited long-term effectiveness against Chinese AI development.

The entire training pipeline relied on Chinese-manufactured hardware.
Cryptopolitan Read →

Adds competitive context by comparing Meituan to MiniMax on long-context model strategy, and flags that domestic-chip optimization may handicap performance on Nvidia hardware still dominant outside China.

Originally reported by scmp.com

Read the original article →

Original headline: Meituan Open-Sources LongCat-2.0 — 1.6T-Parameter MoE Becomes First Chinese Frontier Model Trained End-to-End on Domestic AI Chips, Matching DeepSeek V4-Pro on Benchmarks