openreview.net web signal

ICML 2026 Bans LLM Authorship, Watermarks Submitted Papers

TL;DR

  • ICML 2026 runs July 6-11 in Seoul, with full papers due January 28 and LLMs explicitly barred from authorship.
  • Reviewers pick Policy A (no LLM use) or Policy B (limited LLM use); prompt injection attempts trigger automatic desk rejection.
  • March enforcement caught 506 reviewers and desk-rejected 497 papers via hidden PDF watermarks drawn from a 170,000-phrase dictionary.

The most interesting thing about the ICML 2026 conference is not the venue or the deadline. It is that the organizers have decided the arrival of large language models is a peer review problem worth actively engineering against, and they are willing to publish the receipts.

The call for papers sets the ground rules for the July 6 to 11 event in Seoul: full papers are due January 28, 2026, capped at eight pages of main text with unlimited references and appendices, and LLMs are not eligible for authorship. Prompt injection attempts get desk rejected. Reviewers pick between two policies before assignment. Policy A prohibits LLM use in reviewing entirely. Policy B allows LLMs to help understand background material and polish text, but not to evaluate strengths and weaknesses, suggest outlines, or write the review.

Then in March the organizers announced enforcement. Roughly 497 papers were desk-rejected. 506 unique reviewers were caught using LLMs after selecting Policy A. The detection method: every submission PDF was watermarked with hidden instructions telling any LLM that processed it to emit two specific phrases drawn from a dictionary of 170,000. The organizers report success rates above 80 percent against frontier models in testing and a family-wise error rate of 0.0001 after manual verification. 51 reviewers who used LLMs in more than half of their assignments had all their reviews removed.

Why this matters if you are not submitting to ICML: this is the clearest publicly-run experiment yet in whether a major venue can detect and enforce AI-use policies at scale, and the numbers give everyone else a baseline. The organizers themselves concede the watermark is "not a difficult measure to circumvent" and would mostly catch copy-paste behavior, so the caught roughly 1 percent of reviews is a floor, not a ceiling.

The honest caveat is that the announcement does not tell us how many of the desk-rejected papers were reinstated on appeal, or how detection performed across different LLM families. What the reporting also does not say is whether NeurIPS and ICLR will copy the watermark approach or negotiate something different. What ICML has shown is that the enforcement question is no longer hypothetical, and other program chairs now have to decide whether to run their own version.

Shared on Bluesky by 3 AI experts