Microsoft has removed a blog post after it was criticized for suggesting the use of copyrighted material, specifically Harry Potter books, to train large language models (LLMs).

Details of the incident

The post, written by Pooja Kamath, a senior product manager at Microsoft, dated back to November 2024. The article promoted a new feature that, according to the blog, made it easier to add generative AI features to applications using Azure SQL DB, LangChain, and LLMs. To illustrate the potential of this feature, the article proposed using the Harry Potter books as a sample dataset.

Reactions and removal

The guide sparked immediate negative reactions, particularly on Hacker News, where it was accused of encouraging piracy and the creation of low-quality AI-generated content. Following these criticisms, Microsoft removed the post from the blog.