What AO3 and Fanfiction Can Teach AI Safety About Governing Human Messiness

Community Article Published March 24, 2026

AO3

AI safety discourse has a bad habit of treating governance problems as if it arrived fully formed with the launch of ChatGPT in November of 2022. Some of the most difficult problems in AI governance and even online content moderation are older than the current LLM renaissance. These include how to handle upsetting/hateful but lawful expression, how to distinguish abuse from disagreement, how to preserve user agency without collapsing into negligence, how to write rules that can actually be enforced, and how to keep a platform legitimate when everyone is convinced their edge case is the only edge case. One of the best places to study those problems is not a frontier AI lab, but the fanfiction website Archive of Our Own, and the broader legal and policy ecosystem built by the Organization for Transformative Works, AO3’s parent organization. AO3 and OTW are fascinating examples of how to promote maximum freedom of expression while complying fully with the law and general social mores of safety.

Growing up as a hardcore nerd means I am quite familiar with concepts like derivative fiction, and as a result, I take fandom communities seriously as governance environments. That does not mean fandom memes and jokes are a substitute for national AI policy, or that AO3 is some magic blueprint; however there’s precedent now for a large, adversarially stressed archive for transformative works that has already spent years solving institutional problems that AI companies now describe as novel. And, more awkwardly for the people who still smirk when fanfiction comes up, the modern language model ecosystem has spent years learning from the same narrative substrate. What I am about to say might be an uncomfortable truth for some, but AO3 matters to AI safety as a clear and legitimate governance case study. A second uncomfortable truth is that fanfiction matters to AI not just metaphorically, but technically as well.

Don’t draw random conclusions like “LLMs are secretly fanfic machines,” which is the kind of joke that becomes less funny the closer you get to the training pipeline. And also because OpenAI or Anthropic never published ARXIV papers saying “we trained on AO3,” (I’ll get to that in a minute). I think the real point about language models and literature is more interesting. Since GPT-3 came onto the scene in 2020, high-performing language models have depended on enormous quantities of long-form narrative text ranging from novels, web fiction, dialogue-heavy prose, community writing, poetry, and web-scale crawls that sweep up exactly the kinds of texts serious people like to pretend are peripheral. In the GPT-3 era, content dependence was mostly implicit. In the open-model era, it became explicit.

OpenAI’s GPT-3 paper is the right place to start for a content provenance analysis, because OpenAI demonstrates both what the company disclosed and what it did not. GPT-3 was trained on a mixture dominated by filtered Common Crawl, plus WebText2, Books1, Books2, and Wikipedia. Common Crawl alone made up the majority of the training mixture by weighted token count. OpenAI did not identify AO3 by name, and it did not publish a tidy source-by-source list of specific domains inside Common Crawl. But later, OTW stated that once it learned AO3 had been included in Common Crawl, the dataset used to train systems “such as ChatGPT,” it deployed technical measures in December 2022 to stop the Common Crawler from collecting data from the archive again. Even where fanfiction was not explicitly acknowledged in model cards, it was already part of the narrative ecology being pulled into general-purpose language modeling.

The role of data in language models and what type of data it is matters because long-form narrative prose gives language models capabilities that short snippets and cleaned reference text do not. Books and fanworks are useful for the same reason that researchers have repeatedly prized books in pretraining corpora, works of fiction contains long-range dependencies, recurring entities, dialogue, viewpoint shifts, coherent scene structure, stable voice, and social causality that unfolds over thousands of tokens rather than fifty. The Pile implies scraping almost embarrassingly plainly about its Books3 component, noting that books were included because they are “invaluable” for long-range context modeling and coherent storytelling. EleutherAI’s line should probably be taped to the wall of every debate about whether narrative text is central or peripheral to LLM development, narrative text is a core function.

Interestingly, fanfiction has a few properties that make it unusually legible to machines. Many works of fanfiction are verbose and intensely tagged. Some works encode relationships, tropes, warnings, fandom membership, and genre conventions as explicit metadata visible to human and scraper bot alike. Other works tend to be dialogue-rich, emotionally labeled, and structurally repetitive in useful ways. If a dev wanted a pretraining corpus that teaches a model how people narrate desire, conflict, pacing, identity, banter, literary tropes, or canon-constrained variation at scale, you could do much worse than fanfiction. AO3’s tagging ecosystem acts as a dense annotation layer on top of narrative text. Even when a model is trained only on raw text and not metadata, the surrounding ecosystem makes fanfiction unusually attractive for downstream dataset construction, synthetic labeling, retrieval, evaluation, and finetuning.

EleutherAI’s Pile dataset and the earlier Pile paper were landmark moments for the transformers neural net because they made the composition of a large language modeling corpus much more explicit than the GPT-3 era did. For those who don’t know, The Pile is a 22-source corpus for large-scale language modeling, and one of those sources is highly contentious; the Books3 dataset. The Pile’s datasheet says Books3 was derived from a copy of the contents of the private tracker Bibliotik that had been made publicly available.

Books3 is also where the legal and ethical trouble gets sharper. The Pile datasheet notes that Books3 is almost entirely composed of copyrighted works. Model cards for GPT-J and Pythia state directly that those models were trained on the Pile. So when people talk about “open” LLM development from 2021 onward, part of what they are talking about is an ecosystem whose technical progress depended in no small part on a massive book corpus with deeply contested provenance. Books3 proves to be a focal point for many language models and how they were trained, and it resulted in several lawsuits over copyright infringement.

Last year, an opinion was issued in Kadrey v. Meta. The federal court record described Meta’s Llama 1 and Llama 2 training mix as drawing roughly two-thirds of its data from Common Crawl, with the rest from sources including Wikipedia, GitHub, arXiv, Stack Exchange, and a combined books source that included Project Gutenberg and Books3. The opinion also quotes internal Meta communications saying books were especially valuable, with one employee calling them the “best” resource, and describes Meta’s turn to shadow libraries after licensing efforts failed. The opinion also states there was no real dispute that Meta had torrented LibGen and Anna’s Archive, and that the plaintiffs’ books were found in the downloaded datasets, including Books3 and Anna’s Archive. However one comes out on the legal questions, the factual picture is already striking. Frontier-capable models were being built on top of long-form narrative corpora gathered through a mix of lawful sources, contested sources, and plainly pirated ones.

Anna’s Archive matters here because it has become part of the infrastructure story, not just the copyright fight. In March 2026, major publishers sued Anna’s Archive, and the complaint alleged the site hosted more than 63 million books and 95 million papers. The Association of American Publishers summarized the filing by alleging that Anna’s Archive had solicited substantial cryptocurrency payments from LLM developers and data brokers in exchange for high-speed access to a massive corpus of copyrighted texts. It’s critically important to remember that the lawsuit above is an initial complaint with allegations, and those allegations are not facts. Nevertheless, the plaintiff’s argument fit uncomfortably well with what the Meta case already surfaced. The use of shadow libraries not just as passive pirate mirrors sitting off to the side of the AI boom but a core mechanism of the data provenance pipeline.

The US legal system has started, slowly, to reflect that distinction. U.S. courts in 2025 began to draw a line between AI training on lawfully acquired materials and training workflows that involved pirated or improperly sourced data. Ongoing litigation and a few legal opinions don't resolve AI and copyright policy debates overnight with a magic button, but it does mean the field can no longer hide behind the fiction that all pretraining controversies are the same controversy. Sourcing, acquisition, and governance all matter in terms of how language models are developed and deployed.

There is a reason I bring up AO3 and it's not only because the fanfiction website is a core part of the language models users interact with every day. Data provenance is a real and legitimate concern, especially for those who feel that AI will obsolesce the creative industry or put people out of work. Many scholars are working on that already. What I find more interesting though is how AO3 embodies a philosophy of governance that AI companies desperately need today. The AO3 Terms of Service FAQ states that all abuse reports are reviewed by humans, not algorithms or bots. It also states that decisions are not made by report volume, which is a fancy way of saying mob pressure does not become the truth just because enough people click the form. The TOS Spotlight on Ratings and Warnings adds that every ticket is reviewed by a human volunteer, that duplicate reports slow things down, and that AO3 aims for “maximum inclusiveness” of fanwork content while maintaining narrow enforceable rules. AO3 isn’t trying to be lazy here, but implement terms of use with institutional clarity that allows for the maximum level of expression allowed for creative minds to flourish.

AO3’s content warning system is especially instructive. AO3 has a small set of mandatory archive warnings, but it also allows “Creator Chose Not To Use Archive Warnings.” OTW has explicitly explained their content moderation regime as a scope decision. The archive keeps the mandatory warning set limited because not every conceivable warning can be abuse-enforceable. AI companies should pay attention here. A lot of present-day AI safety rhetoric is really just institutional overclaiming; promises of universal risk detection, universal context comprehension, and universal coherence across every edge case, and of course “we will send your chats to law enforcement”. AO3’s governance model is much more narrow and wise. AO3 says, in effect, here is what we can enforce, here is what we cannot reliably know, and here are the tools users have to navigate the rest.

That last part matters because user agency is a core part of safety. AO3 is built around tags, filters, ratings, relationship labels, and warnings. Users are not treated as passive recipients of centrally curated correctness, but are given navigation tools. Don’t strawman here about harm risk, AO3’s structure does not eliminate harm itself, but it does create a system in which institutional rules and individual choice work together. In AI, by contrast, we still lurch between two bad extremes; libertarian chaos like we see with Grok and paternalistic flattening like we see with OpenAI and Anthropic. AO3 suggests a third option, which is structured agency under legible rules.

OTW is also unusually explicit about the difference between “offensive” and “disallowed.” In its TOS Spotlight on offensive content, OTW said that more than 15% of the complaints its Policy & Abuse team receives each year concern material users find offensive but that does not violate the Terms of Service or US law. That single statistic is a small masterpiece of platform governance. It says two things at once. A person being offended is a real emotional response to content you’d find distressing. That same offense is not self-validating as a rule, and offensiveness is not a crime. AI companies need that distinction badly. If every user discomfort is treated as proof of model failure, policy becomes incoherent. If every discomfort is dismissed as oversensitivity, policy becomes callous. AO3’s answer is stricter than either caricature. People should not be scared that their rambling and messy thoughts should be used against them in civil or criminal matters.

There is also a civil-liberties lesson here. OTW’s legal project is not an ornamental add-on. It is part of the institution. OTW publicly describes its mission in terms of copyright, fair use, and freedom of expression, and its FAQ explicitly frames fair use as a limit on copyright that protects free expression. In a 2025 post on fair use and AI, OTW again emphasized that fair use protects transformative purposes. You do not have to agree with every OTW argument to see the deeper point about safety and civil liberties not being enemies. In fact, if your safety institution has no principled theory of expression, due process, and limits on enforcement, it probably is not a safety institution at all, but a surveillance panopticon who is either indifferent or outright contemptuous towards your rights.

And then there is the irony that should make the entire field blush a little. By 2025, dedicated AO3-derived datasets were openly circulating on Hugging Face. The full-text dataset nyuuzyou/archiveofourown described itself as a multilingual collection of roughly 12.6 million fanworks with rich metadata. A metadata-only alternative, trentmkelly/archiveofourown-meta, pitched itself as useful for research and potentially for AI work on metadata patterns. And the fiction dataset SaladTechnologies/fiction-1b explicitly reported that 22.2% of its source material came from AO3. The above examples shift from implicit content ingestion to explicit content extraction. Fanfiction stops being hidden inside a generic crawl and becomes a named, composable training asset.

That is why I think the right framing for policymakers and AI ethicists is not “AO3 solved AI safety,” but something both humbler and more damning. AI safety keeps rediscovering governance lessons that fandom institutions composed disproportionately by neurodivergent and LGBT+ communities have already learned, while the technical side of AI has quietly depended on the kinds of narrative corpora those institutions stewarded since the Web 1.0 era.

None of what I’ve mentioned already means piracy was inevitable, or that there are no alternatives. The Common Pile project is one promising example of a large pretraining corpus built from public-domain and openly licensed text. The existence of efforts like that is important because it undercuts the laziest defense of bad data practices, which is that there was no other way. There were and are other ways. They may be more expensive, less convenient, or less immediately powerful. But “hard” is not the same as “impossible,” and “helpful for benchmark progress” is not the same as “normatively clean.”

So what can AO3 teach AI safety? A good place to start is that human review matters when stakes are contextual and users are adversarial. No content moderation regime should ever involve an AI agent, especially if the escalation leads to law enforcement encounters. Additionally, narrow enforceable rules are better than fake omniscience. Users are put off by both “do whatever you want” and also “this is against our content policy, but we can’t explain why”. User agency is not the opposite of safety, but part of it, and companies ought to take note. Fourth, that civil liberties are not a luxury add-on to governance, but a precondition for legitimate governance. Finally, we should acknowledge the technical history of language modeling is a lot less separable from fandom, narrative culture, and contested text commons than the industry’s self-mythology likes to admit.

If the AI world were a little less eager to sneer at fan communities, it might have learned some of these lessons sooner. And if it were a little more honest about what actually made modern LLMs good at prose, it would admit that fanfiction was never just a joke at the edge of the ecosystem, but part of why language models work so well.

Community

Sign up or log in to comment