arxiv.org web signal

SP-Mind automates spatial proteomics from natural-language queries

TL;DR

  • SP-Mind converts natural-language queries into end-to-end spatial proteomics workflows, from raw multiplexed tissue imaging to downstream phenotype discovery.
  • The authors introduce SP-Bench, an evaluation suite of 102 tasks across 18 distinct categories spanning diverse tissue types.
  • The paper, accepted to ICML 2026, reports state-of-the-art performance compared to existing open-source biomedical agent baselines.

Spatial proteomics is one of those fields where the analysis stack lags behind the imaging. You take a tissue slice, image many protein markers at once, then have to thread a custom pipeline to get from a stack of multiplexed images to a real biological claim. A new paper from Yucheng Yuan and colleagues, posted on arXiv and accepted to ICML 2026, proposes pushing the whole thing under one autonomous agent.

SP-Mind, as the authors describe it, converts natural-language queries into end-to-end analytical workflows. The pitch is straightforward: a biologist types what they want to know, the agent walks the pipeline from raw multiplexed tissue imaging to downstream phenotype discovery, drawing on expert-curated biological analysis skills and specialized computational tools. To test it, the team introduces SP-Bench, 102 tasks across 18 distinct categories spanning diverse tissue types, and reports state-of-the-art performance compared to existing open-source biomedical agent baselines.

The angle worth holding onto is what this would change if it generalizes. A working autonomous agent that handles the choreography of spatial proteomics analysis lets biologists ask questions in plain language rather than stitching scripts. The companion benchmark may matter more than the agent itself, because shared evaluation is what lets the next group's claim be checked against this one.

The honest caveat is the comparison set. The paper benchmarks against open-source biomedical agents, which is a narrow slice of what a motivated user could throw at the problem today; the abstract does not claim a comparison to closed-source frontier models with general tool use, nor to expert pathologists running the analysis by hand. The reporting also does not give you the failure-mode profile across those 102 tasks, which is where you would want to look before trusting it on a clinical sample. Take the state-of-the-art claim as a starting line, not a ceiling. The interesting thing to watch is whether SP-Bench gets adopted by other groups, because that, more than this particular leaderboard placement, is what would actually move spatial proteomics tooling forward.

Shared on Bluesky by 2 AI experts