Scoopfeeds — Intelligent news, curated.
AI models are choking on junk data
business

AI models are choking on junk data

Fortune · May 3, 2026, 1:30 PM

How we get from Chat GPT to humanoid robots relies on one of the most consequential, but least discussed bottlenecks in artificial intelligence – the quality of the data that we feed these systems to learn from. Thus far, the AI industrial complex has operated on the idea that feeding models more data means smarter models. This worked brilliantly when researchers could simply vacuum up the internet to train large language models. But we’re on the cusp of the next frontier of AI — physical AI and world models – systems that will learn and ultimately operate in the physical world. Think about the cognition it takes to navigate roads and traffic, fold laundry, or assist in complicated medical surgeries. These all require something that can’t simply be downloaded. It requires rich and multifaceted data from which these world models can learn. There’s now a potential crisis in motion that could have major implications on the AI movement. If we aren’t able to stem the excess of junk data – data that isn’t able to move a model forward in development – the entire promise of physical AI and world models may never achieve its full potential. A big part of the problem is the hunger for data to feed new and better models. AI companies are ravenous for that data, which has spawned a wave of multi-billion dollar AI data startups that provide these services like Scale AI, Surge AI, and Mercor. But catering to those insatiable appetites has produced a bounty of junk data that actually don’t advance AI models at all. Junk data is easier to produce, but the data needed for physical AI and world models requires much more time and effort. Because the physical world is very complex, training these models to understand the multi-dimensional world requires significantly more data — data that is also very hard to get. Machine learning engineers resort to simulating this data, and that requires hours upon hours of virtual reenactments of real world-scenarios to create the data that wil

Article preview — originally published by Fortune. Full story at the source.
Read full story on Fortune → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from Fortune alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop