AI Models and the Memorization of Training Data
Recent research has shown that large language models (LLMs) can generate near-identical copies of copyrighted works from the data they were trained on. This raises questions about the actual ability of these systems to "learn" without actively storing the original material.
Analyses conducted on models from leading companies such as OpenAI, Google, Meta, Anthropic, and xAI indicate a memorization of training data higher than previously estimated. This discovery challenges the main line of defense of AI companies in copyright infringement lawsuits, which argue that their models "learn" from protected data, but do not retain copies of it.
The ability of a model to faithfully reproduce portions of text covered by copyright could have significant implications for ongoing legal battles, jeopardizing the position of companies that develop and distribute these systems.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!