Reddit has taken legal action against the artificial intelligence company Anthropic. The lawsuit, filed on Wednesday in San Francisco, claims that Anthropic illegally scraped millions of Reddit user comments without permission. The data was allegedly used to train Anthropic’s chatbot, Claude.
Reddit says Anthropic used automated bots to access its content despite being asked to stop. The company also claims Anthropic trained its AI using personal data from Reddit users without their consent. Anthropic has denied these accusations and said it will defend itself.
Reddit and Anthropic are both headquartered in San Francisco, which is where the lawsuit was filed.
Reddit’s chief legal officer said that AI companies should not be allowed to collect and use people’s content without clear limits. Reddit has existing licensing agreements with Google, OpenAI, and other AI firms. These agreements let those companies use Reddit content to train AI legally while protecting user privacy and content deletion rights.
The licensing deals have also helped Reddit raise money ahead of its public listing on Wall Street last year.
Anthropic was founded in 2021 by former OpenAI leaders. Its chatbot Claude competes with OpenAI’s ChatGPT. Amazon is Anthropic’s main commercial partner and uses Claude to improve its Alexa voice assistant.
Like other AI companies, Anthropic relies on publicly available data from sites such as Wikipedia and Reddit. These sites provide large amounts of written content that help train AI to understand human language patterns.
A research paper by Anthropic’s CEO from 2021 identified specific Reddit forums that contain valuable training data, including topics like gardening, history, and advice forums.
Anthropic has argued that its method of training Claude is legal under current copyright law because it involves making copies for statistical analysis. The company is also facing a separate lawsuit from music publishers over alleged copyright violations involving song lyrics.
Reddit’s lawsuit is different because it does not claim copyright infringement. Instead, it focuses on the breach of Reddit’s terms of service and alleges unfair competition.
This lawsuit brings attention to the wider issue of how AI companies use online data. Websites like Reddit want more control over how user content is accessed and used.
Legal experts say this case could set important rules on protecting user data and balancing it with AI development.