Microsoft built supercomputer to help OpenAI infringe copyrights, NYT alleges in amended complaint

On Thursday, The New York Times filed a motion to amend its copyright complaint against OpenAI and Microsoft, introducing an allegation that sheds new light on the role of infrastructure in the AI era.

According to the heavily redacted document, Microsoft went beyond providing cloud services to OpenAI and built a bespoke supercomputing system, ranked among the most powerful in the world, with the deliberate intent to encourage the unlawful use of the newspaper’s works. The amendment follows a Supreme Court ruling in the Cox Communications case, which raised the bar for contributory infringement claims: plaintiffs must now prove that a party acted with intent to induce others to break the law.

A contested supercomputer

The core of the new complaint revolves around technical infrastructure. The NYT claims Microsoft created a custom supercomputer to train OpenAI’s large language models, an investment that Redmond has repeatedly touted as central to its AI strategy. Insiders familiar with cloud-scale systems note that such setups – typically GPU clusters on Azure – offer enough compute power to process petabytes of data in record time. If the allegation holds, providing such a machine would mark a qualitative shift from mere hosting: an intentional act of enablement.

The impact of the Cox precedent

The Supreme Court’s decision on Cox Communications redefined the obligations of service providers. In the telecom sector, mere tolerance of user piracy no longer constitutes contributory infringement; proof of active encouragement is required. The NYT seized the opportunity to align its legal strategy, stating it had gathered new evidence during discovery. Spokesperson Graham James said: “Today, we asked the court for permission to file an amended complaint that further strengthens our case, clarifying our claim of contributory infringement against Microsoft based on new law and new evidence uncovered during discovery.”

Beyond the cloud: what it means for on-premise deployments

For technology decision-makers evaluating the adoption of LLMs in controlled environments, the case has deep implications. Self-hosted models trained on proprietary data can reduce legal exposure, but only if the entire training pipeline respects copyright regulations. AI-RADAR regularly tracks best practices for data sovereignty and fully on-premise stacks, where governance of training sources becomes a non-negotiable requirement. The hardware – often cutting-edge GPUs with hundreds of GBs of VRAM – demands significant investment but limits third-party dependency. Choosing between cloud and bare metal is not just an economic call: it entails legal accountability for those providing computational resources.

The future of copyright in AI training

The NYT-Microsoft dispute marks a turning point. As courts begin to scrutinize the contribution of infrastructure to infringement, companies that develop or host models will need to adopt stricter due diligence processes. For supercomputing providers – both public and private – transparency about data usage could become a competitive factor. In the meantime, those building their own clusters for inference or fine-tuning must weigh cloud agility against the risks of a rapidly evolving legal landscape.