paper web signal

MrFlow reports 10x training-free speedup on FLUX.1, Qwen-Image

TL;DR

  • MrFlow claims 10x end-to-end acceleration on FLUX.1-dev and Qwen-Image while keeping the OneIG metric within 1% of the baseline.
  • Stacking the method with timestep distillation reportedly pushes the total speedup up to 25x on the same models.
  • The pipeline generates structure at low resolution, then super-resolves in pixel space with a lightweight pretrained GAN, requiring no training.

A new preprint on arXiv, MrFlow, argues that you can get roughly an order of magnitude off the wall-clock cost of running FLUX.1-dev and Qwen-Image without training anything and without writing a custom kernel. The claim is a 10x end-to-end acceleration with the OneIG image-quality metric staying within 1% of the baseline, and up to 25x when the technique is stacked with timestep distillation.

The mechanism, as the authors describe it, is a staged sampler rather than a clever attention trick. MrFlow "first rapidly generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained GAN-based model," followed by a low-strength noise injection to let the diffusion model resample high frequencies and refine detail. Because the expensive sampling steps happen on a smaller token grid, the paper frames the win as a "quadratic token reduction and reduced step requirement of low-resolution sampling." The authors emphasise that the pipeline needs "no training or runtime dynamic identification whatsoever," which is what makes it interesting as a drop-in.

Why this matters if you're not writing samplers yourself: most of the diffusion speedups that have actually landed in production over the last year have required either a distilled checkpoint, a custom attention kernel, or both. A training-free method that composes with distillation, rather than competing with it, is unusually easy to adopt if the numbers hold, and it lowers the bar for running FLUX.1-dev and Qwen-Image on modest hardware.

The honest caveat is that this is a single-source arXiv preprint from the authors, not yet an independent reproduction. The reported quality gap is an aggregate score, and what the paper does not give you is a breakdown on the kinds of prompts where a low-resolution-first pipeline is most likely to fail, such as small legible text, tight faces, or detail-dense compositions. It is also unclear from the abstract whether the 25x figure transfers to community fine-tunes of the tested models or only to the base checkpoints.

The direction worth watching is whether inference providers and open-source pipelines start folding this in alongside their existing distilled variants, because if MrFlow really is orthogonal to distillation the two together are the interesting product, not either alone.