arxiv.org web signal

Six LLMs Assigned Feminine Animal Characters 2.2% of Stories

TL;DR

  • Across 23,800 AI-generated stories, feminine animal characters appeared in just 2.2% versus 40.6% masculine.
  • Models avoided assigning any gender 19% of the time on average; gender-neutral 'it/its' pronouns appeared in 38.2% of stories.
  • The authors argue AI neutrality can erase marginalized identities rather than protect them, challenging a common alignment assumption.

When an LLM writes a story about a rabbit teaching a class or a fox running a business, what gender does the animal get? A new paper presented at FAccT 2026 by Imani Finkley, Yuanxi Li, and Melanie Walsh finds the answer reveals something counterintuitive about how AI handles representation: models mostly dodge the question, and when they commit, the results tilt sharply masculine.

The researchers generated 23,800 stories across six leading LLMs, using seven anthropomorphic animal characters whose gender was left unspecified, varied across four narrative settings and different model temperatures. Across those stories, models skipped gender altogether about 19% of the time on average, and used gender-neutral language like "it" or "its" in 38.2% of stories. When a gender was assigned, masculine characters appeared in 40.6% of stories — feminine characters in just 2.2%.

That disparity is the paper's central provocation. The authors argue that "models that prioritize neutrality to address social bias may actually contribute to the erasure of marginalized perspectives and identities." The instinct to stay neutral sidesteps stereotyping, but it also sidesteps representation. A model that hedges on gender more than half the time and defaults masculine when it doesn't is not producing balanced output — it is treating feminine presence as the exception.

The honest caveats: the paper does not name which six LLMs were studied, so it is hard to know whether the pattern is broadly shared or concentrated in a few systems. The anthropomorphic animal context is also specific — whether the same neutrality-as-erasure dynamic holds for human characters or for identity dimensions beyond gender is something the study does not address.

The framing the authors propose is the constructive thread to follow: distributing "social possibilities across imagined subjects" more equitably, rather than defaulting to neutrality. What that looks like in practice — prompting changes, fine-tuning, or revised evaluation criteria — is left for follow-on work. But the 23,800-story dataset and the audit methodology give practitioners something concrete to build on.

Shared on Bluesky by 2 AI experts