paper web signal July 2nd 2026

MrFlow reports 10x training-free speedup on FLUX.1, Qwen-Image

TL;DR

MrFlow claims 10x end-to-end acceleration on FLUX.1-dev and Qwen-Image while keeping the OneIG metric within 1% of the baseline.
Stacking the method with timestep distillation reportedly pushes the total speedup up to 25x on the same models.
The pipeline generates structure at low resolution, then super-resolves in pixel space with a lightweight pretrained GAN, requiring no training.

A new preprint on arXiv, MrFlow, argues that you can get roughly an order of magnitude off the wall-clock cost of running FLUX.1-dev and Qwen-Image without training anything and without writing a custom kernel. The claim is a 10x end-to-end acceleration with the OneIG image-quality metric staying within 1% of the baseline, and up to 25x when the technique is stacked with timestep distillation.

The mechanism, as the authors describe it, is a staged sampler rather than a clever attention trick. MrFlow "first rapidly generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained GAN-based model," followed by a low-strength noise injection to let the diffusion model resample high frequencies and refine detail. Because the expensive sampling steps happen on a smaller token grid, the paper frames the win as a "quadratic token reduction and reduced step requirement of low-resolution sampling." The authors emphasise that the pipeline needs "no training or runtime dynamic identification whatsoever," which is what makes it interesting as a drop-in.

Why this matters if you're not writing samplers yourself: most of the diffusion speedups that have actually landed in production over the last year have required either a distilled checkpoint, a custom attention kernel, or both. A training-free method that composes with distillation, rather than competing with it, is unusually easy to adopt if the numbers hold, and it lowers the bar for running FLUX.1-dev and Qwen-Image on modest hardware.

The honest caveat is that this is a single-source arXiv preprint from the authors, not yet an independent reproduction. The reported quality gap is an aggregate score, and what the paper does not give you is a breakdown on the kinds of prompts where a low-resolution-first pipeline is most likely to fail, such as small legible text, tight faces, or detail-dense compositions. It is also unclear from the abstract whether the 25x figure transfers to community fine-tunes of the tested models or only to the base checkpoints.

The direction worth watching is whether inference providers and open-source pipelines start folding this in alongside their existing distilled variants, because if MrFlow really is orthogonal to distillation the two together are the interesting product, not either alone.

Originally reported by paper

Read the original article →

Original headline: MrFlow: Training-Free 10x–25x Speedup for FLUX.1 and Qwen-Image, No Custom Kernels