ICML desk-rejects 497 papers over LLM review violations
TL;DR
- ICML 2026 desk-rejected 497 papers, around 2% of submissions, after catching 398 reciprocal reviewers who broke the conference's LLM-use policy.
- Detection used hidden instructions embedded in submitted PDFs that prompted any LLM to emit two phrases drawn from a 170,000-phrase dictionary.
- Of 506 caught Policy A reviewers, 51 had over half their reviews flagged, and every flagged review was manually verified before action.
ICML's program chairs have just published a headcount of how widely reviewers at a top machine learning venue lean on LLMs to do the work, and the numbers are concrete in a way the surrounding discussion has not been. According to a post on the ICML blog, the 2026 conference desk-rejected 497 papers, roughly 2% of submissions, after detecting that 398 reciprocal reviewers had violated the policy on LLM use during reviewing.
The enforcement mechanism is what makes this interesting. Reviewers at ICML 2026 picked one of two stances when they signed up: Policy A, no LLM use at all beyond spell-checkers, or Policy B, limited LLM help to understand a paper or polish writing but not to judge quality or draft the review. The 506 caught reviewers all chose Policy A, then used an LLM anyway. Detection used a watermarking technique from Rao, Kumar, Lakkaraju and Shah: every submission PDF had hidden instructions inserted that told any LLM processing it to emit two phrases pulled from a 170,000-phrase dictionary. Pre-deadline tests reportedly hit success rates over 80% with frontier LLMs. Every flagged review was manually verified, and a March 26 update gave a family-wise error rate of 0.0001.
Why this matters if you submit to ML conferences is the coupling. ICML's peer-review ethics make a reviewer's violation grounds for desk-rejecting the papers that reviewer co-authored, so an author's fate is tied to every co-author who agreed to review. About 1% of all reviews, 795 in total, were flagged, and 51 reviewers had more than half their reviews caught, so the misconduct is concentrated rather than spread evenly.
The honest caveats are the ones the post does not address. It tells us almost nothing about Policy B reviewers, about appeals, or about what happens to co-authors of a desk-rejected paper who themselves did nothing wrong. The watermark also only catches reviewers who paste the raw PDF into an LLM, so anyone who strips the text first or uses a different flow stays invisible. The 2% figure is probably a floor, not a ceiling.
The forward-looking question is whether NeurIPS, ICLR, ACL or CVPR adopt the same setup. If they copy the watermarking scheme and the co-author-liability rule, the equilibrium for ML peer review shifts from quietly suspected to provably enforceable. That is the part to watch over the next submission cycle.
Shared on Bluesky by 1 AI expert
Originally reported by blog.icml.cc
Read the original article →Original headline: On Violations of LLM Review Policies – ICML Blog