arxiv.org web signal

ModelSMC recasts LLM-driven model discovery as probabilistic inference

TL;DR

  • The authors introduce ModelSMC, an algorithm based on Sequential Monte Carlo sampling that treats candidate scientific models as weighted particles.
  • They recast LLM-based model discovery as sampling from an unknown distribution over mechanistic models capable of explaining observational data.
  • Reported experiments on unnamed real-world scientific systems claim interpretable mechanisms and improved posterior predictive checks.

A new preprint takes a stab at cleaning up something that has been quietly untidy in the recent wave of LLM-driven scientific discovery work: the algorithms themselves are usually just hand-tuned loops, with no explicit statistical story for why the loop should converge on a good model. In a paper posted to arXiv in February 2026 and revised in June, Stefan Wahl, Raphaela Schenk, Ali Farnoud, Jakob H. Macke, and Daniel Gedon propose recasting the whole setup as probabilistic inference, described in the abstract as sampling from an unknown distribution over mechanistic models capable of explaining the data.

The concrete instantiation they offer is ModelSMC, which sits on top of Sequential Monte Carlo sampling. In their formulation the candidate models are particles, an LLM does the work of proposing and refining those particles, and likelihood-based criteria weight them. The authors argue that model proposal, refinement, and selection all live inside a single inference framework rather than existing as separate ad hoc stages of a pipeline.

On results, the abstract is careful. It says experiments on real-world scientific systems produce models with interpretable mechanisms and improve posterior predictive checks. What the abstract does not give you is which scientific systems, which LLM was doing the proposing, or how ModelSMC compares numerically against the heuristic agentic pipelines it is positioned against. For a method paper that is the honest caveat, and worth flagging before treating this as a settled improvement.

The reason a practitioner in scientific ML should still keep an eye on this: if the probabilistic-inference framing generalises beyond whatever experiments the full paper includes, it gives a shared vocabulary for comparing LLM-based model discovery methods that today mostly get benchmarked case by case, and it pulls uncertainty quantification into the loop rather than bolting it on afterwards.

Shared on Bluesky by 2 AI experts