huggingface.co web signal

Flux-GS shrinks 3D Gaussian Splatting for Snapdragon mobiles

computer vision edge ai ai-research

TL;DR

  • Flux-GS replaces third-order spherical harmonics (48 parameters per Gaussian) with first-order SH via a Monte Carlo Specular Energy Aggregator, skipping distillation and pre-training.
  • Per-Gaussian memory footprint drops by 61% and 26% versus prior 3DGS variants, with rendering targeted at the Snapdragon 8 Gen 3 GPU through WebGL inside a browser.
  • A Multi-view Alpha-based Densification and Pruning strategy using 6 sampled cameras controls primitive counts; total training runs 30k iterations with only the first 3k at full third-order SH.

A new paper out of UTS, Baidu, and Adelaide University takes aim at the specific bottleneck that has kept high-fidelity 3D scene rendering off mobile devices: the spherical harmonics coefficients that 3D Gaussian Splatting uses to model view-dependent lighting. The team's method, Flux-GS, posted on Hugging Face under the title "Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting," compresses the standard third-order SH representation, which the authors say costs 48 floating-point parameters per Gaussian primitive, down to first-order SH while trying to preserve the look of specular highlights.

The mechanism is a Monte Carlo Specular Energy Aggregator. Rather than discarding the high-order radiance information, the authors sample K=2048 directions on a unit sphere, extract the directional moments of the third-order specular residual, and project that energy into a compact latent that a lightweight MLP maps to first-order SH. An Attribute-Conditioned SH Enhancement module then adds a residual offset based on each Gaussian's geometry, but the offset is precomputed once and baked into the explicit Gaussian parameters before rendering. The authors are explicit that this introduces zero additional computational overhead during inference. A Multi-view Alpha-based Densification and Pruning strategy using 6 sampled cameras controls the primitive count, replacing the single-view gradient-based densification of vanilla 3DGS.

Why this matters if you are not training Gaussian Splatting models yourself: the demo target is a Qualcomm Snapdragon 8 Gen 3 phone, and the rendering pipeline runs in WebGL inside a browser rather than a CUDA-locked desktop app. The paper reports per-Gaussian memory footprint reductions of 61% and 26% relative to prior 3DGS variants, and the team explicitly avoids the expensive teacher-student distillation that competing lightweight methods like Mobile-GS require. Training runs 30k iterations, with only the first 3k using full third-order SH before the Monte Carlo compression kicks in.

The honest caveat is the one the authors flag themselves. First-order SH cannot model genuine mirror-like reflections the way third-order SH can, so highly glossy materials suffer. The initial 3k training iterations still need peak memory comparable to standard 3DGS, so the training-side story is less mobile-friendly than the inference one. And the multi-view pruning can mistakenly cull micro-structures only visible from a narrow angle the stratified sampling missed. What the paper does not give you in the abstract and body text is a single headline FPS number or a final storage figure in megabytes; those sit in the result tables.

If the WebGL deployment story holds up outside the controlled benchmark, the downstream beneficiaries are the people who want to ship navigable 3D captures, virtual try-on, or eventually volumetric video on phones without forcing users to install anything. The authors themselves point at 4D Gaussian Splatting on mobile as the natural next step, and that is the direction worth watching.