MeshFlow Generates Triangle Meshes at 18x Inference Speedup
TL;DR
- MeshFlow generates 3D meshes in 0.877 seconds, 18.55x faster than autoregressive methods like MeshXL at 28.93 seconds.
- The 124M parameter model matches or beats quality scores from MeshXL, a 1.3B parameter autoregressive model, on ShapeNet benchmarks.
- A modified Diffusion Transformer enforcing face permutation and vertex rotation invariances is the core architectural contribution.
Generating a valid 3D triangle mesh quickly has been harder than it sounds. Most competitive methods have relied on autoregressive generation, predicting one token at a time, which produces coherent meshes but at inference times measured in tens of seconds per shape. A paper accepted to SIGGRAPH 2026, from researchers at Stanford, Cornell Tech, the University of Texas at Austin, and City University of Hong Kong, proposes a different framing: treat the mesh as a triangle soup and build a flow matching model that respects the geometry's natural symmetries from the start.
A triangle soup is an unordered collection of triangular faces. The key observation in MeshFlow is that two symmetries are always present: you can reorder the N triangles in any way (permutation invariance), and within each triangle you can rotate through the three vertices cyclically, without changing the underlying shape. Prior autoregressive approaches assigned an arbitrary ordering to resolve this ambiguity, which introduces training signal that fights the symmetry structure of the data. MeshFlow instead modifies the Diffusion Transformer (DiT) architecture to respect both symmetries at the face and vertex level, and pairs it with a nested optimal transport coupling strategy during training. The coupling aligns noise to data more efficiently, producing straighter flow trajectories and faster convergence.
According to results on the project page and the accompanying paper, MeshFlow at 124M parameters generates a mesh in 0.877 seconds. MeshGPT takes 16.27 seconds; MeshXL, a 1.3B parameter autoregressive model, takes 28.93 seconds. The reported speedup is 18.55x over those autoregressive methods. On ShapeNet quality benchmarks, MeshFlow achieves the best 1-nearest-neighbor accuracy score in 3 of 4 tested object categories, with an optional post-processing step reducing self-intersection rates by roughly 56% at about 23 milliseconds of additional overhead.
The honest caveats are specific. The nested optimal transport coupling relies on the Hungarian algorithm, which runs in O(n³) time, a hard constraint that the authors acknowledge will limit scalability to large face counts and for which they propose approximate OT techniques as future work. The paper also notes artifact generation (missing and overlapping faces) and attributes it to limited training compute rather than a flaw in the method; that claim is plausible but untested at larger scale. Conditional generation (text-to-mesh or image-to-mesh) is explicitly out of scope here.
The more transferable contribution may be the architectural principle. A flow matching model that encodes the symmetry group of its data representation rather than working around it produced comparable quality to MeshXL, a 1.3B parameter model, at 124M parameters and a fraction of the inference cost. If that principle carries over to other structured 3D generation tasks, the geometry of the symmetry group becomes a design input rather than an afterthought.
Originally reported by huggingface.co
Read the original article →Original headline: MeshFlow: Equivariant Optimal-Transport Flow Matching Generates 3D Triangle Meshes at 18× Speedup — Accepted to SIGGRAPH 2026