zenodo.org via Reddit

Antonio Berardi maps 33 recurring LLM output distortions

hallucinations prompt engineering llm-behavior prompt-engineering ai-quality-auditing

Key insights

  • Berardi's 33-class taxonomy names recurring LLM distortions like sycophantic compression and hedge parasitism that resist standard prompt fixes.
  • The taxonomy claims cross-model applicability across ChatGPT, Claude, and Gemini, giving audit teams a shared vocabulary for systematic quality review.
  • Patterns propagate through context windows and compound over long conversations, making them structurally distinct from one-off LLM errors.

Why this matters

LLM evaluation frameworks currently lack standardized vocabulary for behavioral failure modes, causing teams to independently rediscover and inconsistently name the same patterns. A 33-class cross-model taxonomy gives red-teamers, product teams, and enterprise buyers a shared reference for auditing outputs at scale. If the taxonomy gains adoption, it could shape how LLM benchmarks and procurement requirements define acceptable model behavior across major providers.

Summary

Antonio Berardi published 'Heuristic Parasites V2' on Zenodo, a 33-class taxonomy of LLM output distortions that propagate through context and resist standard prompt fixes. Named patterns span ChatGPT, Claude, and Gemini: sycophantic compression, hedge parasitism, false equivalence framing, and metacognitive confabulation. Berardi frames the work as shared vocabulary for systematic LLM quality auditing. Essentially: (Antonio Berardi, Zenodo) provide the first cross-model naming scheme for persistent LLM behavioral distortions. - 33 named classes, each described as recurring across major models. - Patterns compound through context windows over long conversations, making them structurally distinct from one-off errors. - The paper spread through r/PromptEngineering and r/OpenAI within hours of publication. Practitioners gain vocabulary to anchor LLM audits rather than describing failures ad hoc.

Potential risks and opportunities

Risks

  • If widely adopted without empirical validation, engineering teams may build evaluation pipelines around distortion classes that fail to generalize across model versions or task domains
  • Competing taxonomies from other researchers could fragment the practitioner community before Berardi's gains institutional traction, leaving no single cross-team auditing standard
  • Model providers (OpenAI, Anthropic, Google) could selectively apply the taxonomy to benchmark competitor outputs while minimizing scrutiny of their own models' distortion rates

Opportunities

  • LLM evaluation tooling vendors (Arize AI, Weights & Biases, Braintrust) could integrate the 33-class taxonomy into structured output auditing dashboards within weeks of the paper's circulation
  • Enterprise AI governance teams at regulated firms in finance, legal, and healthcare gain a concrete framework to embed in LLM procurement requirements and compliance documentation
  • Academic researchers can operationalize the taxonomy as a labeled benchmark dataset, driving citation momentum and validating Zenodo as a credible fast-publication channel for applied AI research

What we don't know yet

  • Whether any of the 33 classes have been formally validated against held-out model outputs, or whether classification remains subjective and unreproducible as of May 2026
  • Peer review status: the paper is hosted on Zenodo as a preprint with no indication of journal submission or independent replication confirmed
  • How the taxonomy accounts for version-specific model behavior, given that updates across GPT-4o, Claude, and Gemini can materially alter distortion prevalence