AIs can generate near-verbatim copies of novels from training data

Pubblicato il 2026-02-23 15:43 ✅ Ars Technica AI 📰 Leggi l'articolo originale →

🏷️ LLM On-Premise 🏷️ Fine-Tuning 🏷️ DevOps

Le AI replicano romanzi quasi integralmente dai dati di training

AI Models and the Memorization of Training Data

Recent research has shown that large language models (LLMs) can generate near-identical copies of copyrighted works from the data they were trained on. This raises questions about the actual ability of these systems to "learn" without actively storing the original material.

Analyses conducted on models from leading companies such as OpenAI, Google, Meta, Anthropic, and xAI indicate a memorization of training data higher than previously estimated. This discovery challenges the main line of defense of AI companies in copyright infringement lawsuits, which argue that their models "learn" from protected data, but do not retain copies of it.

The ability of a model to faithfully reproduce portions of text covered by copyright could have significant implications for ongoing legal battles, jeopardizing the position of companies that develop and distribute these systems.

Key Takeaway

New studies reveal that language models from OpenAI, Google, Meta, Anthropic, and xAI memorize significant portions of their training data. This 'memorization' ability raises legal questions about copyright infringement, undermining the defense of AI companies that claim models 'learn' but do not 'store' copies.

🤖 Ask AI about this

Want to dive deeper? Read the full article from the source:

📖 READ THE ORIGINAL ARTICLE

💻 Need GPU Cloud Infrastructure?

For running LLM inference, training models, or testing hardware configurations, check out this platform:

🚀

PeerPush AI Community Platform

Discover and share AI tools and projects. Connect with developers, get feedback, and grow your AI startup in a vibrant community of innovators.

✓ AI Community ✓ Project Showcase ✓ Developer Network

🔗 This is an affiliate link - we may earn a commission at no extra cost to you.

💬 Comments (0)

🔒 Log in or register to comment on articles.

No comments yet. Be the first to comment!

🔍 Continue Exploring

SECTION

AI-Radar LLM On-Premise

Complete guide to running AI models locally: hardware, stack, privacy, and reference architectures.

→