huggingface.co web signal

PoseShield lifts SMPL self-collision fix rate to 95.8%

TL;DR

  • PoseShield operates as a constraint directly in SMPL pose space rather than mesh space, formulating collision resolution as constrained optimization.
  • On the new Humans with Collisions benchmark, the method raises success rate from 0.446 (best baseline, COAP) to 0.958.
  • The same learned constraint works as a generator-agnostic post-hoc corrector for motion sequences with no retraining of the underlying generator.

Self-collision is the unglamorous failure mode of modern human motion pipelines: a state-of-the-art generator produces a plausible-looking sequence and then a forearm passes cleanly through the ribcage. A new paper, PoseShield, posted to arXiv on 29 June 2026, goes after that problem by moving the fix out of mesh space and into SMPL pose space, where the optimisation actually lives.

The setup is straightforward. The authors formulate collision resolution as a constrained optimisation problem: find the nearest collision-free pose to a self-intersecting one. To make that solvable with standard gradient-based solvers like SLSQP or augmented Lagrangian, they need a constraint function that is both signed (positive for collision-free, negative for self-intersecting) and has non-vanishing gradients near the boundary. Their move is to train a neural field with Eikonal regularization, which approximates a signed distance function to the collision boundary in pose space and, the paper argues, keeps the Linear Independence Constraint Qualification approximately satisfied.

The headline number is that on a newly curated Humans with Collisions (HwC) benchmark the method reaches a 0.958 success rate, up from 0.446 for the strongest baseline, COAP, with smaller pose deviations on top. The same learned constraint then drops onto motion sequences as a post-hoc corrector that does not require retraining the underlying generator, which is the part that matters for anyone already invested in a particular text-to-motion stack.

The honest caveat is that the benchmark is the authors' own, and the comparison set (Torch-mesh-isect, a classifier baseline inspired by N-Penetrate, COAP, and VolumetricSMPL) does not cover every production pipeline. The paper also does not give per-frame runtime numbers in the sections retrieved here, so whether this is viable for real-time avatars rather than offline cleanup is unclear, and the Eikonal property is approximate rather than exact (the authors prove a volume bound on where it fails).

If the success-rate gap holds on third-party motion generators, the practical upside is a clean separation of concerns: keep your favourite generator, bolt PoseShield on at the end, and stop shipping limbs that intersect torsos.