news.bloomberglaw.com web signal

Nearly 400 Newspapers Sue OpenAI and Microsoft for Scraping

TL;DR

  • A publisher coalition owning nearly 400 newspapers filed suit against OpenAI and Microsoft on June 24, 2026.
  • The complaint alleges defendants scraped articles to train ChatGPT and Copilot, giving publishers 'not a cent.'
  • OpenAI invoked fair use, saying its models are 'trained on publicly available data.'

The suit lands with unusual weight because of its scale. A coalition of publishers collectively owning and operating nearly 400 newspapers filed in the U.S. District Court for the Southern District of New York on June 24, 2026, targeting both OpenAI and Microsoft. According to Bloomberg Law, the complaint alleges the defendants "systematically and secretly crawled" publisher websites, copied articles without permission or compensation, and used that content to train ChatGPT and Microsoft Copilot.

The financial argument at the heart of the case is pointed: publishers say they spent billions protecting their content through paywalls, only to have those investments rendered worthless by AI systems that reproduced articles in response to user queries. The complaint also alleges that defendants removed copyright management information from copied works -- a claim that reaches beyond copyright infringement into Digital Millennium Copyright Act territory.

Attorney Matthew Platkin called this "the largest legal effort led by local and regional newspapers," explicitly distinguishing it from prior AI litigation, most of which has involved larger national media brands. That distinction matters: regional newspapers operate on thinner margins and have arguably more to lose if their content becomes freely available training material with no compensation flowing back.

OpenAI spokesperson Drew Pusateri responded with what has become the standard industry defense: "Our models empower innovation, are trained on publicly available data, and are grounded in fair use." That framing -- "publicly available" -- is likely where the case turns. What the reporting does not resolve is how a court will define that phrase when the content sits behind paywalls, or what dollar figure the publishers are actually seeking in statutory damages.

For those watching the AI and media landscape, the more immediate question is whether a coalition this large accelerates licensing negotiations that were already underway at some outlets, or whether it forces a broader reckoning on how AI companies are permitted to source training data from the open web going forward.

Shared on Bluesky by 2 AI experts