osf.io web signal

Bartoš et al. find AI education gains shrink under bias controls

TL;DR

  • Bartoš and colleagues pool 1,840 effect sizes across 67 prior meta-analyses on AI in education, then re-analyse them as a single body of evidence.
  • They find severe publication bias and extreme between-study heterogeneity; after correcting for bias, reported effects on learning shrink substantially.
  • Bias-adjusted prediction intervals encompass both negative and positive outcomes, and the authors say current evidence does not support sweeping policy claims.

A new PsyArXiv preprint from a team at the University of Amsterdam and the Czech Academy of Sciences takes a hard look at the growing literature on AI in education, and the headline finding is not flattering. Pooling 1,840 effect sizes from 67 prior meta-analyses, the authors report severe publication bias and extreme between-study heterogeneity in the existing evidence base. Once they adjust for that bias, the reported effects of AI tools on learning shrink substantially, and the prediction intervals span both negative and positive outcomes.

The paper, hosted on OSF, is by František Bartoš, Oliwia Z. Bujak, Patrícia Martinková, and Eric-Jan Wagenmakers. It extends earlier work from members of the same team, a 2025 analysis that, after correcting for publication bias, found no evidence that ChatGPT improved students' learning performance, learning perception, or higher-order thinking. The new preprint moves one level up the evidence pyramid: instead of re-analysing primary studies, it re-analyses the meta-analyses themselves, and the same pattern survives.

For anyone making policy or procurement decisions about AI in classrooms, this is the kind of evidence that should slow things down. The authors are explicit that current evidence does not support sweeping policy claims about AI benefits in education. That is a different, and stronger, statement than 'AI does not help students.' It is a statement about the reliability of the underlying literature, which is what most district-level and ministry-level guidance ends up being built on.

The honest caveat is that this is a preprint, not yet peer-reviewed, and 'effects shrink after bias adjustment' is itself a statistical claim that other meta-analysts will want to interrogate. What the reporting does not give you is a per-tool or per-subject breakdown, or any per-age-group view of which interventions might still survive the bias correction. But the direction of travel is the part worth watching: the more carefully the AI-in-education literature gets re-examined, the smaller the headline numbers seem to get, and that is worth knowing before the next budget cycle of pilots and contracts.

Shared on Bluesky by 2 AI experts