404media.co via Reddit June 11th 2026

Cornell Links AI's Elias Thorne Pattern to WildChat Data

openai hallucinations synthetic data ai-research training-data

Key insights

Cornell researchers analyzed 20,000 AI-generated stories and found the same 11 words appearing in over 88% of outputs across ChatGPT, Gemini, and Claude.
The pattern traces to WildChat, a GPT-3.5 dataset where only 166 of 1 million conversations feature the Elias lighthouse story.
Amazon now hosts multiple 'Elias Thorne' books including alt-medicine cancer handbooks, some flagged for containing dangerous misinformation.

Why this matters

Safety alignment, the process designed to make AI outputs more acceptable, appears to create structural narrative convergence across every model trained on shared upstream data, making this an industry-wide problem rather than one isolated to a single lab. The escape of Elias Thorne into Amazon book listings and YouTube content farms, some containing dangerous health and foraging misinformation, shows how a training artifact can propagate into real consumer harm with no current platform mechanism to intercept it. For practitioners building on foundation models, the finding, first spotted by software engineer Daniel May in early 2026, demonstrates that alignment and fine-tuning pipelines can amplify statistically rare patterns from shared upstream datasets in ways that are detectable from the outside before any lab internally identifies them.

Summary

ChatGPT, Gemini, and Claude keep writing the same story: a man named Elias Thorne, lighthouse keeper, carrying a quiet coastal tragedy. Cornell researchers Sil Hamilton and David Mimno analyzed 20,000 AI-generated stories and found the same 11 words, names like Elias, Mara, and Elara paired with occupations like lighthouse keeper and clockmaker, appearing in over 88% of outputs across competing models. The trail leads back to WildChat, a 1-million-conversation dataset built from OpenAI's GPT-3.5. Only 166 of those conversations contain the Elias lighthouse pattern, but alignment training appears to funnel models toward that safe slice. "It isn't that Elias stories are frequent, but that they're just so safe," Hamilton said. Essentially: (OpenAI's GPT-3.5, WildChat) seeded a narrative pattern the entire AI model family then replicated. - Amazon now hosts multiple "Elias Thorne" books, including alt-medicine cancer handbooks and foraging guides flagged for dangerous misinformation. - YouTube AI content farms generate Elias Thorne story variations at scale. - Hamilton described the spread across model generations: "It's like a virus." The character has acquired a commercial life independent of any single model, and some of those commercial outputs carry health misinformation with no current mechanism to catch them.

Potential risks and opportunities

Risks

AI-generated health and foraging books attributed to 'Elias Thorne' on Amazon carry dangerous misinformation, with no current industry standard for identifying or removing synthetic-author content at scale.
Developers using WildChat or other GPT-3.5-derived datasets in fine-tuning pipelines risk unknowingly embedding the same narrative bottleneck into new models, propagating the artifact further.
If alignment processes at OpenAI, Google, and Anthropic are converging on the same safe-content signatures from shared upstream data, correlated failure modes or biases could surface simultaneously across the entire model class.

Opportunities

Dataset audit tooling that detects cross-model narrative convergence signatures could help AI labs identify hidden homogenization artifacts before model release.
Amazon and YouTube face mounting pressure to build attribution and quality-control mechanisms for AI-generated content, creating a market opening for AI content provenance vendors.
Independent researchers who apply statistical pattern-detection to AI outputs at scale, as Hamilton, Mimno, and Daniel May did, point toward a viable AI output monitoring niche for safety and IP-integrity use cases.

What we don't know yet

Whether Amazon has reviewed or removed the AI-generated 'Elias Thorne' health and foraging guides that researchers flagged for dangerous misinformation.
Whether OpenAI, Google, or Anthropic have acknowledged the WildChat narrative homogenization pattern and whether any plan to address it is in progress.
Which specific alignment technique is responsible for the hypothesized safe-content bottleneck, a question Hamilton and colleagues say they plan to explore in future studies.

Shared on Bluesky by 14 AI experts (top 5 by trust)

Mark Riedl @markriedl.bsky.social: AI generated stories, aka the Elias Thorne literary universe www.404media.co/elias-thorne... →
David Mimno @dmimno.bsky.social amplified

Sil Hamilton @srhm.ca

@404media.co wrote about our new preprint on tell-tale signs of AI-generated stories! Cc @dmimno.bsky.social Paper: arxiv.org/abs/2605.26492 Article: www.404media.co/elias-thorne...
View on Bluesky →
Meredith Broussard @merbroussard.bsky.social amplified

Sam Cole @samleecole.bsky.social

something strange is haunting large language models... a character that's broken out of the LLMs and now authors books, sells albums, and shows up in fake news sites www.404media.co/elias-thorne...
View on Bluesky →
404 Media @404media.co: LLMs including ChatGPT, Gemini and Claude are obsessed with telling stories about lighthouse keepers and clockmakers, and one character name… →
Mike Masnick @masnick.com amplified

@annaleen.bsky.social

Chatbots' obsession with a fictional character named "Elias Thorne" gives new meaning to Virginia Woolf's novel "To the Lighthouse." Amazing story of a hallucination shared by multiple chatbots from @404media.co www.404m…
View on Bluesky →

Originally reported by 404media.co

Read the original article →

Original headline: Cornell Study Explains Why All AI Chatbots Keep Generating the Same 'Elias Thorne' Lighthouse Keeper Story: Safety Training Creates Narrative Bottleneck