AI Training Data Scrutiny Intensifies: Anthropic's Reported Use of Competitor Information Ignites Fair Use Debate

The landscape of artificial intelligence development is facing increased scrutiny regarding the provenance of training data, particularly concerning the use of competitor information and the application of fair use doctrines. A recent social media post by ROSS highlighted this tension, stating, > "What data are Anthropic's models trained on? It includes Thomson Reuters competitor data. This means Thomson Reuters is competing with competitors using models trained on TR's competitors' data. Which is fine, because fair use should apply to training AI models." This tweet underscores a growing debate over intellectual property rights and competitive practices in the rapidly evolving AI industry.

Anthropic, a prominent AI developer behind the Claude models, acknowledges that its AI systems are trained on a diverse and proprietary mix of data. This includes publicly available information from the internet, non-public data acquired from third parties, and data generated internally or provided by contractors. The company has faced legal challenges regarding its training data, notably settling a class-action copyright lawsuit with US authors who alleged their copyrighted works were used without permission or compensation.

Thomson Reuters, a major information and data provider, is also a significant player in the AI space, investing over $200 million annually in AI development for products like CoCounsel. The company emphasizes that its AI solutions are grounded in its exclusive content, and it contractually prohibits third-party partners from using customer data to train their models. Thomson Reuters has actively engaged in licensing discussions with generative AI providers for its content, signaling a clear intent to monetize its vast data archives.

The legal interpretation of "fair use" in AI training remains a complex and evolving area. A June 2025 ruling in Bartz et al. v. Anthropic PBC found that Anthropic's use of lawfully acquired books for training constituted fair use, emphasizing the transformative nature of LLM training. However, a February 2025 decision in Thomson Reuters v. Ross Intelligence delivered a different outcome, ruling against Ross Intelligence for using Thomson Reuters' data to train its AI, deeming it a competitive use that could harm Thomson Reuters' potential market for AI training data.

These contrasting legal outcomes highlight the nuanced considerations courts face when evaluating fair use in AI contexts, particularly when competitor data or market substitution is involved. The ongoing debate will likely shape future data licensing agreements and the ethical guidelines for AI model development, influencing how companies like Anthropic and Thomson Reuters navigate the competitive landscape.