Audit: Meta and TikTok DSA Research APIs miss half the feed
TL;DR
- An audit from the University of St. Gallen found vetted researchers can pull only about 75% of TikTok and 50% of Instagram posts shown to users.
- Of the posts that come through, only 17% of TikTok and 42% of Instagram metadata parameters reach researchers, with moderation context systematically stripped.
- The paper concludes current research-access implementations fall short of the DSA's intended oversight function because removed and downranked content rarely survives in researcher datasets.
Researchers at the University of St. Gallen reconstructed what TikTok and Instagram actually showed real users during two elections, then compared it against what Article 40(12) of the EU's Digital Services Act lets vetted researchers pull through the platforms' official Research APIs. The gap is the story. According to the arXiv preprint, researchers can access only around 75% of the posts TikTok's For You feed serves up and only about 50% of what Instagram's Explore feed serves. For the posts that do come through, only 17% of TikTok metadata parameters and 42% of Instagram metadata parameters survive the trip.
The method matters. Luka Bekavac and Simon Mayer ran sockpuppet accounts configured to interact primarily with political content during the 2024 U.S. presidential election on TikTok and the 2025 German federal election on Instagram, then treated the user-visible feed as ground truth. Against that baseline, the official research pipes look like sieves. The paper reports that under the historically applied 25,000-follower cutoff that governed Meta Content Library access for much of the study period, 49.35% of user-visible posts came from accounts entirely excluded from researcher access. Ephemeral content like live video came in at 0% accessible. Between 17.7% and 23.3% of TikTok posts were no longer accessible within weeks.
The framing the authors land on is survivorship bias. Content that gets removed, downranked, or otherwise intervened with, which is often the content most relevant to systemic-risk research, is the content least likely to be preserved in a researcher's dataset. The paper concludes that current research-access implementations fall short of the DSA's intended oversight function. The operational limits do not help. Meta Content Library caps researchers at 1,000 queries per rolling seven-day window, and the TikTok Research API allows up to 1,000 requests per day and a maximum of 100,000 records per day.
The honest caveat is that this is one paper, two platforms, two election windows. It does not tell you whether the gaps look the same outside election periods, whether forthcoming delegated acts under the DSA close them, or how this access regime should sit alongside privacy obligations. But the direction is what regulators and platform safety researchers should be watching. If the dataset under the most ambitious platform transparency regime in the world is missing the moderated half, every downstream claim about systemic risk inherits that bias.
Shared on Bluesky by 2 AI experts
-
This paper about the publicly available data that is missing from Meta's and TikTok's research tools/APIs, by Luka Bekavac and Simon Mayer, is super interesting. And this graphic is a doozy. arxiv.org/html/2601.12...
View on Bluesky →
Originally reported by arxiv.org
Read the original article →Original headline: Auditing Meta and TikTok Research API Data Access under Article 40(12) of the Digital Services Act