intheweights.com web signal

In the Weights Scores How Strongly AI Models Know Who You Are

hallucinations training-data llm-behavior community

TL;DR

  • Joey Flynn and Thomas Dimson, both former OpenAI employees, built the site, which launched in June 2026.
  • The tool queries models including GPT-5.5, Claude Opus 4.8, Gemini, Grok, and Llama, scoring recognition up to a maximum of 996.
  • Appearing in a 1-billion-parameter model like Meta's Llama signals especially high relevance, because smaller models compress knowledge more aggressively.

Type a name into In the Weights and you get back what previously required a developer and spare time to approximate: how confidently multiple AI models recall a specific person from their training data alone, without relying on web search or any external tools. The site queries several models, clusters the results, and assigns a "strength score" running from zero to a ceiling of 996, a ceiling the creators reserve for names like Mozart, Shakespeare, and Taylor Swift. That calibration is itself a statement about what the tool is actually measuring.

The site was built by Joey Flynn and Thomas Dimson, both former OpenAI employees, and launched in June 2026 with a retro pixel-art design that understates a pointed question about whose existence got baked into AI systems. The Decoder covered the tool with hands-on tests, reporting strength scores of 175 and 262 for two of its writers. The models queried span frontier and smaller open-weight systems simultaneously, including GPT-5.5, Claude Opus 4.8, Gemini, Grok, GLM, Qwen3 8B, and the Llama series.

The clearest practical finding from early use concerns smaller models. According to the creators, smaller models make it harder to appear in results at all, so anyone who shows up in Meta's Llama at one billion parameters counts as highly relevant, because a model that size is compressing knowledge far more aggressively than a frontier-scale system. Being present in a tiny model clears a more demanding bar than being present in a large one.

The honest caveats come from the creators themselves: models can hallucinate biographical details, typos drag down scores, and common names often produce worse results. A high score is not evidence that the model's account of you is accurate, only that it is confident. What the reporting does not give you is a clear account of how the cluster of raw model responses converts into a single number, or how the tool performs for names from contexts where training data coverage has historically been thin.

The audience with the clearest stake is anyone whose professional reach is starting to depend on AI-mediated discoverability: researchers, journalists, founders, clinicians. As more information-seeking happens through AI rather than search, a model's ability to recall you accurately starts to function differently than a search-engine index entry, and tools like this make that gap visible for the first time.

Shared on Bluesky by 7 AI experts (top 5 by trust)