The race for artificial intelligence supremacy has entered a contentious new chapter. While the world watches the release of increasingly powerful large language models (LLMs), a shadow war is being fought over the very data used to train them. In a series of startling reports, leading American AI firms—including Anthropic, OpenAI, and Google—have accused several prominent Chinese AI startups of bypassing years of research and billions of dollars in investment through a technique known as a "distillation attack."
At the center of the latest controversy is Anthropic, the creator of the Claude series of models. The company recently disclosed that it detected a massive, coordinated effort to harvest its intellectual property. According to Anthropic, firms including DeepSeek, Moonshot AI, and MiniMax allegedly used over 24,000 fake accounts to generate more than 16 million conversations with Claude. The goal? To use Claude’s sophisticated reasoning and logic to train their own competing models at a fraction of the cost.
To understand why these allegations are so significant, one must understand the concept of model distillation. In a legitimate research context, distillation is a common technique where a smaller, more efficient "student" model is trained to mimic the behavior of a larger, more complex "teacher" model. This allows developers to create fast, lightweight AI that can run on smartphones or local hardware while retaining much of the intelligence of a massive data-center-grade model.
However, a distillation attack occurs when a competitor uses the API (Application Programming Interface) of a rival's model to systematically extract its knowledge without permission. Think of it like a student who, instead of studying the original textbooks and doing the lab work, simply records every word a world-class professor says and uses those recordings to build a rival course. The student saves years of effort and millions in tuition, while the professor’s original work is devalued.
The sheer scale of the activity reported by Anthropic suggests a highly industrialised operation. By creating 24,000 separate accounts, the attackers were likely attempting to circumvent "rate limits"—the safety brakes that AI companies put in place to prevent any single user from hogging resources or scraping data.
By spreading 16 million queries across these accounts, the Chinese firms allegedly gathered a massive dataset of high-quality "synthetic data." This data is particularly valuable because it contains the "chain-of-thought" reasoning that models like Claude 3.5 and Claude 4 are famous for. For a company like DeepSeek or Moonshot AI, this harvested data acts as a shortcut, allowing them to bridge the gap between their current capabilities and the state-of-the-art without the astronomical costs of original discovery.
Anthropic is not alone in its grievances. Earlier this month, OpenAI and Google issued similar warnings, noting that their proprietary models were being queried in patterns that suggested automated data harvesting by entities linked to the Chinese tech sector.
This trend highlights a growing desperation in the global AI race. As the U.S. government tightens export controls on high-end NVIDIA chips—the hardware essential for training AI—Chinese firms are facing a "compute crunch." If they cannot access the hardware to train models from scratch using raw data, their most viable path forward is to "distill" the intelligence already perfected by American companies who have the chips to spare.
The implications of these attacks extend far beyond corporate balance sheets. We are witnessing the solidification of an "AI Cold War," where intellectual property is the primary battlefield.
| Feature | Original Training | Distillation Attack |
|---|---|---|
| Cost | Billions (Compute + Talent) | Millions (API Fees + Scraping) |
| Timeframe | Years of R&D | Months of Data Harvesting |
| Hardware Needs | Tens of thousands of H100/B200 GPUs | Standard Cloud Infrastructure |
| Data Source | Massive web crawls + Human feedback | Outputs of a rival's model |
For U.S. policymakers, this is a national security concern. If Chinese firms can successfully "short-circuit" the development process, the lead currently held by the U.S. in AI safety and capability could evaporate. This has led to calls for stricter "Know Your Customer" (KYC) requirements for AI API providers, effectively treating access to a powerful LLM with the same level of scrutiny as a bank account.
AI labs are no longer just focusing on making their models smarter; they are focusing on making them harder to steal. Several defensive strategies are currently being deployed:
As the AI landscape becomes more litigious and defensive, developers and businesses should prepare for a more restrictive environment.
The allegations against DeepSeek, Moonshot AI, and MiniMax represent a fundamental shift in the AI industry. The era of "open research" is rapidly closing as companies realize that their outputs are their most valuable assets. While the U.S. continues to lead in raw innovation, the ability of global competitors to mirror that innovation through distillation remains a potent threat. The AI Cold War is no longer a theoretical future—it is the reality of the present.



Our end-to-end encrypted email and cloud storage solution provides the most powerful means of secure data exchange, ensuring the safety and privacy of your data.
/ Create a free account