-
Technology
-

Encyclopaedia Britannica Files Lawsuit Over Alleged AI Training Misuse

By
Distilled Post Editorial Team

The publisher of Encyclopaedia Britannica has launched a major copyright lawsuit against OpenAI, accusing the artificial intelligence developer of using vast quantities of its reference content without permission to train generative AI systems such as ChatGPT. The complaint, filed in March 2026 in a Manhattan federal court, alleges that OpenAI copied nearly 100,000 articles and dictionary entries from Britannica and its subsidiary Merriam-Webster to help build large language models powering its AI services.

Britannica claims that the company’s AI tools can produce summaries that closely resemble the original articles, potentially diverting readers away from its own platforms. The publisher is seeking monetary damages and a court injunction to prevent further use of its copyrighted material in AI training datasets. The lawsuit represents one of the latest high-profile legal battles over the data used to train generative AI systems, a rapidly expanding area of dispute between technology firms and publishers.

Allegations of large-scale copying and traffic diversion

According to court filings, Britannica argues that OpenAI’s models were trained on large volumes of its copyrighted reference material without licensing agreements or compensation. The complaint alleges that ChatGPT is capable of producing “near-verbatim” reproductions of encyclopaedia entries and dictionary definitions, which could undermine the company’s digital publishing business. Britannica’s lawyers also claim the AI system has at times generated responses falsely attributing content to the publisher or suggesting that its material was used with authorisation. Such behaviour, they argue, amounts to trademark infringement and could mislead users about the source of information.

The publisher contends that these practices threaten its ability to sustain high-quality editorial work. Britannica, founded in the eighteenth century and now operating primarily as a digital reference platform, employs teams of editors and subject specialists who verify articles and maintain scholarly standards. From Britannica’s perspective, allowing AI systems to reproduce or summarise this content without licensing agreements risks eroding the economic model that supports professional knowledge publishing.

Part of a wider wave of AI copyright disputes

The case comes amid a growing global wave of lawsuits challenging how artificial intelligence companies gather training data. Large language models require enormous volumes of text to learn patterns of language and knowledge, and developers often rely on publicly available material scraped from the internet.

Publishers, authors and media organisations argue that such practices frequently involve copyrighted works and should therefore require licences or compensation. Several high-profile lawsuits against AI developers have emerged over the past two years, including cases brought by news publishers, music companies and individual authors. Legal scholars note that the central question in many of these disputes is whether training AI systems on copyrighted material constitutes “fair use” under US copyright law. Technology firms argue that machine learning models do not reproduce works directly but instead learn statistical patterns from large datasets.

Critics, however, point to research showing that large language models can occasionally reproduce passages from their training data verbatim or generate outputs closely resembling original works. As a result, courts are increasingly being asked to determine how traditional copyright frameworks apply to modern AI systems.

Implications for the future of AI and knowledge publishing

The outcome of the Britannica case could have significant implications for both AI developers and publishers. If courts rule that training models on copyrighted material without permission constitutes infringement, AI companies may be required to negotiate licensing agreements for large portions of their datasets. Such a shift could fundamentally change the economics of generative AI development, which currently relies heavily on large-scale web data collection. For publishers and knowledge organisations, the case represents an effort to protect intellectual property in an era when AI systems can instantly generate summaries and explanations that compete with traditional reference websites.

Britannica has previously taken legal action against AI companies over similar issues, reflecting broader concerns within the publishing industry about how generative AI systems use and reproduce curated information. The case against OpenAI is expected to attract close attention from regulators, technology firms and academic institutions, as courts begin to establish legal precedents governing AI training data. As generative AI becomes increasingly embedded in search engines, education platforms and professional workflows, the legal decisions emerging from cases like this may ultimately shape how knowledge is created, distributed and monetised in the digital age.