AI scraping has become its own media business
There are several dimensions to the ongoing legal war between the media industry and AI companies over copyright, and one of the major ones is the question of outputs. Which is to say: Scraping content without permission may be detestable, but if the party doing the scraping isn’t doing anything with it that would compete with the content creator, it’s difficult to prove harm. And many legal proceedings, especially civil claims, depend on showing the actions were harmful. One of the earlier rulings in this area exemplifies the point. A group of authors, including comedienne Sarah Silverman, sued OpenAI way back in 2023 for appropriating their books without compensation. A judge later dismissed several of the authors’ claims because the lawsuit didn’t identify specific outputs that were direct copies. It turns out just pointing out that a large language model (LLM) was trained on your material isn’t enough—you have to show it’s creating outputs that take business away from you. The output problem Copyright lawsuits like the Silverman case often depend on showing specific instances of scraping and reproduction. The problem is, much of this activity is in the realm of bots: scraping done quickly, silently, and at scale. And while the outputs of big, public-facing AI services like ChatGPT, Gemini, and Perplexity are there for everyone to see, there’s a whole shadow industry of mass AI scraping that isn’t. {"blockType":"mv-promo-block","data":{"imageDesktopUrl":"https:\/\/images.fastcompany.com\/image\/upload\/f_webp,q_auto,c_fit\/wp-cms-2\/2025\/03\/media-copilot.png","imageMobileUrl":"https:\/\/images.fastcompany.com\/image\/upload\/f_webp,q_auto,c_fit\/wp-cms-2\/2025\/03\/fe289316-bc4f-44ef-96bf-148b3d8578c1_1440x1440.png","eyebrow":"","headline":"\u003Cstrong\u003ESubscribe to The Media Copilot\u003C\/strong\u003E","dek":"Want more about how AI is changing media? Never miss an update from Pete Pachal by signing up for T