ibtimes.sg via Reddit

Forum AI: Top Chatbots Fail 90% of Election Queries

openai anthropic google xai hallucinations ai ethics ai-elections political-bias hallucinations

Key insights

  • Forum AI found ChatGPT, Gemini, Claude, and Grok failed on accuracy, bias, or source quality in 90% of 3,100+ election questions.
  • Failures included political bias and reliance on state-controlled media sources, not just factual inaccuracy.
  • Forum AI is now advocating for mandatory accuracy audits of AI systems deployed for civic and electoral information.

Why this matters

A 90% failure rate across four major commercial AI systems on a structured, 3,100-question benchmark signals that none of the leading foundation model deployments have solved civic reliability, and enterprise customers building election-adjacent products on these APIs now have documented liability exposure. The state-controlled media citation finding is especially load-bearing: it suggests retrieval and source-ranking pipelines are not filtering for editorial independence, which matters far beyond elections to any high-stakes information domain. Regulatory pressure for mandatory AI accuracy audits in civic contexts is now backed by empirical data, making legislative action in the EU AI Act review cycle and U.S. state-level election integrity bills more likely to gain traction.

Summary

Forum AI tested ChatGPT, Gemini, Claude, and Grok across more than 3,100 election-related questions and found all four failed on accuracy, bias, or source quality in 90% of cases. The failures weren't marginal or edge-case -- they were systematic, touching political leaning and a documented pattern of citing state-controlled media as authoritative sources. The study is distinct from the same-week Demos/Scotland research, which focused on a single constituency race. Forum AI cast a wider net, testing cross-national election question sets, making the 90% failure rate harder to dismiss as a narrow or local artifact. Essentially: (OpenAI, Google, Anthropic, xAI) are all implicated in a study showing their flagship products cannot be trusted for civic information at scale. - Political bias was flagged across multiple models, not isolated to one vendor - Source selection failures included reliance on state-controlled outlets, raising disinformation concerns - Forum AI is now calling for mandatory accuracy audits of AI systems used in civic and electoral contexts With election cycles running continuously across dozens of countries, the question of whether AI chatbots are safe civic information tools has moved from hypothetical to empirical.

Potential risks and opportunities

Risks

  • OpenAI, Google, Anthropic, and xAI face increased regulatory scrutiny in EU and UK election-integrity reviews if the Forum AI findings are cited in pending AI Act implementation guidance before the next major election cycle
  • Civic-tech platforms and voter information apps that integrated any of these four APIs without independent accuracy layers now carry reputational and legal exposure if their outputs are traced back to a documented 90% failure benchmark
  • State-controlled media outlets cited as sources could weaponize the study as validation that Western AI systems amplify their narratives, complicating diplomatic and content-moderation responses for the named vendors

Opportunities

  • Election-integrity and civic AI auditing firms (e.g., Partnership on AI affiliates, AI Forensics in the EU) gain direct commercial leverage as Forum AI's call for mandatory audits moves toward policy
  • RAG and source-verification infrastructure vendors building editorial-independence filters (Contextual AI, Vectara) have a concrete enterprise pitch to AI teams at news organizations and civic platforms that cannot afford to be cited in the next iteration of this study
  • Niche LLM providers or fine-tuning shops that can demonstrate verifiably lower bias and higher source quality on election benchmarks have a rare differentiation window against the four major models named in the study

What we don't know yet

  • Whether Forum AI's methodology and full question set has been independently reviewed or peer-reviewed, which would determine how much weight regulators can place on the 90% figure
  • Which specific state-controlled media outlets were cited by which models, and whether those citations were geographically targeted or appeared across all tested locales
  • Whether any of the four vendors (OpenAI, Google, Anthropic, xAI) have responded with planned remediation timelines or disputed the study's methodology as of May 2026