How Much Does An LLM Remember?

Stanford and Yale researchers extracted 95.8% of a copyrighted novel, word for word, from Claude 3.7 Sonnet. Gemini 2.5 Pro gave up 76.8% of Harry Potter without even requiring a jailbreak. Grok 3 handed over 70.3%. GPT-4.1 was the most resistant at 4.0%, but it still coughed up text after enough attempts. Thirteen books were tested. The words came out.

This was pure memory extraction. No search agents, no RAG pipelines, no web browsing. The researchers used probing and iterative continuation prompts against the production models. The text lives inside the parameters themselves, encoded during training and retrievable with sufficient effort.

Google told the U.S. Copyright Office in 2023 that “there is no copy of the training data, whether text, images, or other formats, present in the model itself.” What is a 76.8% verbatim reproduction of Harry Potter? You can call it learned patterns, statistical relationships, or compressed representations. It’s not a copy, at least not a complete one, but when a system reproduces three-quarters of a novel on request, the semantic distinction between “learned from” and “stored” becomes a tryable issue of fact.

Anthropic agreed to pay $1.5 billion to settle a lawsuit where the court found that storing pirated works was “inherently, irredeemably infringing.” A German court ruled in November 2025 that OpenAI infringed copyright because its model memorized song lyrics. A. Feder Cooper, the Yale co-author, said it plainly: “It was a surprise that they could memorize entire texts” despite guardrails.

The study was published January 6, 2026. I haven’t seen a lawsuit based on it yet, but if this research proves to be accurate, massive lawsuits are sure to follow.

Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.

Shelly Palmer is the Professor of Advanced Media in Residence at Syracuse University’s S.I. Newhouse School of Public Communications and CEO of The Palmer Group, a consulting practice that helps Fortune 500 companies with technology, media and marketing. Named LinkedIn’s “Top Voice in Technology,” he covers tech and business for Good Day New York, is a regular commentator on CNN and writes a popular daily business blog. He's a bestselling author, and the creator of the popular, free online course, Generative AI for Execs. Follow @shellypalmer or visit shellypalmer.com.

How Much Does An LLM Remember?

About Shelly Palmer

Tags

Categories

Get Briefed Every Day!