A torrent of fraudulent requests – 28.8 million exchanges over 45 days – orchestrated, according to Anthropic, by operators linked to Alibaba to clone Claude’s capabilities. It’s the largest model extraction attack the company has ever measured, and it took place after U.S. export restrictions had already limited Chinese access to advanced models. A confidential letter sent to Senators Scott and Warren reveals the scale of the operation: nearly 25,000 artificially created accounts, a systematic violation of terms of service, and a deliberate focus on key features such as agentic reasoning, software engineering, and long-horizon tasks.

The mechanism: not a hack, but a systematic siege

The campaign, active between April 22 and June 5, did not exploit technical vulnerabilities in the model or infrastructure. Instead, it gamed Claude’s public API interface by using a swarm of fake accounts to bypass rate limits and access policies. Each seemingly legitimate exchange was a piece of a broader attempt to distill the model’s capabilities: extracting response patterns, chain-of-thought techniques, and problem-decomposition strategies. It’s the modern equivalent of reverse engineering applied to LLMs, and it works when the target model is sufficiently exposed.

Anthropic’s letter notes that the operation aimed at “some of Claude’s most valuable capabilities,” the very ones that make the model competitive for complex enterprise workloads. This was not a casual collection of outputs, but a deliberate effort to map distinctive competencies.

Why the Alibaba case marks a turning point

The episode lands in a moment of intense geopolitical tension around large language models. Following the release of Mythos – Anthropic’s flagship model – export controls sought to restrict the flow of advanced capabilities to Chinese actors. Yet the attack shows that a cloud-first model, no matter how monitored, remains an exposed target: the attack surface of APIs is inherently wide, and defense hinges on rate limiting and behavioral monitoring, which a well-funded adversary can persistently circumvent.

Alibaba, through its Qwen lab, is investing heavily to reach parity with Western models. Cloning Claude, or even extracting significant portions of its behavior, could dramatically shorten development cycles. In this scenario, the line between fair competition and intellectual property theft blurs, and Anthropic’s accusations highlight a systemic problem: frontier models, when served as a service, are intrinsically “open” to anyone with the resources to query them at industrial scale.

The on-premise nexus: control and data sovereignty

For those tracking enterprise LLM deployment dynamics, the attack reinforces an ongoing reassessment: the choice between API consumption and in-house hosting is not just economic, but architectural and security-driven. On-premise deployments drastically reduce the attack surface for this kind of extraction, because the model remains behind a controlled perimeter, accessible only to authorized systems and users, without public endpoints. They don’t eliminate every risk – an insider or a misconfigured endpoint remain possible vectors – but they shift the defensive center of gravity from global-scale anomaly detection to classic network security and privilege management.

The Claude-Alibaba episode serves as a wake-up call for teams evaluating TCO and risk of hosted models. The cost of a model extraction attack isn’t immediately monetizable, but it erodes the competitive advantage built through massive training and fine-tuning investments. In regulated industries or where the model’s intellectual property is the core differentiator, self-hosting becomes a concrete safeguard.

A rapidly evolving landscape

The industry is already responding. Cloud providers are refining anomaly detection systems and introducing stronger authentication, but the bad news is that attackers learn fast. Anthropic’s letter to senators is also a call for political attention: the export of cognitive capabilities flows not only through chips, but through APIs. As regulations adapt, the architectural choice between cloud and on-premise becomes increasingly strategic for those building on the AI frontier.