scmp.com web signal

Baichuan-M4 Tops HealthBench Hard; MicroPort Toumai Wins EU CE Mark

china ai healthcare openai china-ai medical-ai benchmarks robotics

TL;DR

  • Baichuan-M4 led the HealthBench Hard subset by 15.9 points over GPT-5.5, posting the lowest hallucination rate of 3.3% across the benchmark.
  • MicroPort MedBot's Toumai became the first remote surgical robot to receive the EU CE mark, giving it formal access to EU hospital markets.
  • In March, a London surgeon used Toumai to perform prostate removal on a patient in Gibraltar, approximately 2,400 kilometers away, in the UK's first robotic telesurgery.

China's medical AI sector produced two distinct milestones in the same week. Baichuan-M4, a clinical-grade AI model, topped OpenAI's HealthBench evaluation by leading the second-best model, GPT-5.5, by 15.9 points on the benchmark's Hard subset, according to South China Morning Post. The model also posted the lowest hallucination rate of 3.3% across the benchmark and led every HealthBench knowledge metric. HealthBench tests 5,000 realistic multi-turn clinical conversations against 48,562 rubric criteria written by 262 human doctors, giving it stronger clinical grounding than most generic leaderboards.

The second development is in robotics. Shanghai MicroPort MedBot's Toumai remote surgical robot received the EU CE mark, which the company described in a Hong Kong stock exchange filing as making it the first remote surgical robot to obtain that certification. The Toumai system integrates 5G technology and enables laparoscopic procedures in urology, general surgery, thoracic surgery, and gynecology. In March, a London surgeon performed a remote prostate removal on a cancer patient in Gibraltar, approximately 2,400 kilometers away, in what was described as the UK's first robotic telesurgery. MicroPort reports commercial deployments in nearly 40 countries, with over 15,000 multi-specialty procedures completed globally and more than 700 remote surgeries across 20 countries.

The CE mark gives Toumai formal access to EU hospital markets, a meaningful step for a company already operating commercially across nearly 40 countries. Baichuan-M4 leading every HealthBench knowledge metric and topping the Hard subset by 15.9 points puts a Chinese clinical model ahead of GPT-5.5 and the rest of the field on OpenAI's own evaluation.

The honest caveats are worth stating. A HealthBench lead is a benchmark result, not a clinical trial, and the reporting does not provide peer-reviewed evidence of real-world deployment reliability or performance outside Chinese clinical contexts. For Toumai, regulatory clearance opens the door but leaves open questions about physician adoption and competition with entrenched Western surgical robotics platforms. The reporting also does not give you EU rollout timelines, pricing details, or evidence that the remote surgery use cases demonstrated so far translate at scale.