Reddit Sues Data Scrapers and AI Companies

The Accusation: Stealing Valuable User-Generated Data

By Christina Catenacci, human writer

Oct 24, 2025

Key Points

On October 22, 2025, Reddit filed a Complaint in the United States District Court, Southern District of New York against Serpapi LLC, Oxylabs UAB, Awmproxy, and Perplexity AI, Inc
Reddit has alleged that the Defendants have violated copyright law and used its user-generated content without permission and without entering into an agreement with Reddit that protects users
The Defendants have denied the allegations and plan to defend themselves in court

On October 22, 2025, Reddit filed a Complaint in the United States District Court, Southern District of New York against Serpapi LLC, Oxylabs UAB, Awmproxy, and Perplexity AI, Inc (Defendants). In short, Reddit accused the Defendants of stealing valuable copyrighted user content without permission and without entering into an agreement with Reddit that protects users.

The case gets at the tension between content owners like Reddit and AI companies that use user-generated data for commercial gain. What’s more, this lawsuit deals with not just AI companies, but also data scrapers that get the data from Google’s Search Results Pages to circumvent technological protections.

What Happened?

According to Reddit, the lawsuit was commenced because it was necessary to stop the “the industrial-scale, unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit”.

Three of the Defendants, Oxylabs UAB, AWMProxy, and SerpApi (a Lithuanian data scraper, a former Russian botnet, and a Texas company that publicly advertises its shady circumvention tactics), are data-scraping service providers who specialize in creating and selling tools that are designed to circumvent digital defenses and scrape others’ content. The tools aim to bypass two levels of security: evading Reddit’s own anti-scraping measures, and second, circumventing Google’s controls and scraping Reddit content directly from Google’s Search Engine Results Pages . Reddit equated this behaviour to what bank robbers do—knowing that they cannot get into the bank vault, they break into the armored truck carrying the cash instead.

The fourth Defendant, Perplexity AI Inc., was equated to a “North Korean hacker” and is a willing customer of at least one of its co-defendants. Reddit submits that Perplexity AI will apparently do anything to get the Reddit data to fuel its “answer engine”.

Reddit, founded 20 years ago, is one of the largest repositories of human conversation in existence. In particular, over 100 million unique users engage in discussions each day across its hundreds of thousands of interest-based communities (or “subreddits”), which is a continuous stream of real-time and creative copyrighted works.

According to Reddit, it is prohibited to engage in unauthorized commercialization of Reddit content unless there is an express agreement with guardrails in place to ensure that user rights are protected. In a nutshell, if AI companies want to legally access Reddit data, they need to comply with Reddit’s policies just like Google and OpenAI have.

What is Reddit Claiming?

Reddit has asserted that the first three Defendants have:

scraped the data from Google’s Search Engine Results Pages instead of Reddit’s site (like the bank robbers attacking the truck carrying the cash) while masking their identities, hiding their locations, and disguising their web scrapers as regular people to circumvent or bypass the security restrictions meant to stop them

Reddit has asserted that Perplexity AI has:

ignored Reddit’s cease-and-desist letter after Reddit caught Perplexity AI red-handed by using the digital equivalent of marked bills (to use the bank robbery analogy) to track Reddit data and confirm that Perplexity AI was using Reddit data acquired through the scraping of Google Search Engine Results Pages

In its Complaint, Reddit argued that Congress has already enacted the Digital Millennium Copyright Act to prevent what the Defendants are doing—bypassing technological measures to access copyrighted works. Moreover, Reddit has pointed out that the Defendants know that they do not have permission to do what they are doing, and has claimed the following:

All Defendants have violated the Digital Millennium Copyright Act by unlawfully circumventing technological measures
The Defendants, SerpApi and Oxylabs, have violated the Digital Millennium Copyright Act by trafficking of technology, product, service, or device for use in circumventing technological measure controlling access
The Defendants, SerpApi and Oxylabs, have violated the Digital Millennium Copyright Act by trafficking of technology, product, service, or device for use in circumventing technological measure protecting the right of copyright owner
All Defendants have gained access to and scraped Reddit data on a large-scale, unauthorized, and automated basis, including misappropriation of real-time Reddit content and services and the timely content authored by Reddit users, from which Defendants have been unjustly enriched at Reddit’s expense
The Defendants, SerpApi and Perplexity AI, have engaged in civil conspiracy by entering into one or more contracts or business agreements for the purpose of circumventing the technological control measures described above in order to gain access to Reddit data on a large-scale, unauthorized, and automated basis, including Reddit content and services and the content authored by Reddit users
Reddit has suffered harms since it depends on the contributions of Redditors and its business and reputation has been damaged by the Defendants’ misappropriation of Reddit data

To that end, Reddit has requested that the court grant injunctive relief, damages, costs, and any other legal or equitable relief as the court deems just and proper.

What was the Defendants’ Responses?

Generally speaking, the Defendants all deny the allegations and plan on defending themselves in court.

But it was Perplexity AI that made a statement right on Reddit. Essentially, Perplexity AI noted that “this is a sad example of what happens when public data becomes a big part of a public company’s business model”. More specifically, the AI company stated that the reason that it is being sued by Reddit is likely because it is about a show of force in Reddit’s training data negotiations with Google and OpenAI.

Perplexity AI went on to say that it has not ignored Reddit—whenever anyone asks the company about content licensing, it explains that Perplexity AI, as an application-layer company, does not train AI models on content. In fact, it never has, and thus it is impossible for the company to sign a license agreement to do so.

What Reddit does in fact, is summarize Reddit discussions, and cite Reddit threads in answers, just like people share links to posts on Reddit all the time. Perplexity invented citations in AI for two reasons: so that people can verify the accuracy of the AI-generated answers, and so they can follow the citation to learn more and expand their journey of curiosity. The way Reddit is acting is the opposite of an open internet.

Lastly, Perplexity AI stated:

“In any case, we won’t be extorted, and we won’t help Reddit extort Google, even if they’re our (huge) competitor. Perplexity will play fair, but we won’t cave. And we won’t let bigger companies use us in shell games”

What Can We Take From This Development?

I think that Reddit’s Chief Legal Officer said it best:

“AI companies are locked in an arms race for quality human content - and that pressure has fueled an industrial-scale 'data laundering' economy”

So this is the second Reddit lawsuit that has come up. As I wrote about the first Reddit case against Anthropic, we will need to wait and see what the court decides. These sorts of copyright cases are popping up at a rapid rate in the context of AI, and courts are going to have to set the fair balance between innovation and the rights of companies like Reddit (and its users).

Plainly put, the court is going to have to decide what is fair. Is Reddit “extorting” Perplexity AI as has been alleged, or are the Defendants trying to unlawfully access and use Reddit content without permission? And where does this leave the users? Will there be a chilling effect as a result of these kinds of lawsuits—will users shy away from sharing their thoughts and creative works online because they are unsure of how they will be used against them in the future?