Scoopfeeds — Intelligent news, curated.
Reinforcement Learning in a Cached Internet Will Give Us a Superhuman Forecaster
agentic-ai

Reinforcement Learning in a Cached Internet Will Give Us a Superhuman Forecaster

LessWrong · Jun 28, 2026, 7:37 PM

This is a crosspost of a post from my blog, Metal Ivy. The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecaster.Why RL on forecasting?When Deep Seek R1 came out in January 2025, I felt that the fact that RL on LLMs simply worked was incredible, but using it on coding and math wasn’t the right path.Before RL we had pretraining, a scalable and general training methodology that worked extremely well to get the model to the human level, through learning by imitation over human data. Then RL came in and gave us a way to get even further, to the expert level and beyond, through sampling many trajectories from the LLM and using a reward function to select the best ones to reinforce. But it isn’t general anymore when only short term, self contained verifiable tasks such as coding or math make up the environment.A strongly superhuman coder might change everything - if recursive self improvement happens like the labs hope (and doesn’t kill us). But it might not change that much at all by itself, beyond giving us more of the software abundance we in many ways already have. A strongly superhuman forecaster instantly gives people and organizations the ability to make superhuman decisions through forecasting of their outcomes, and would be a massive boost to the overall competence of our civilization.You may ask why should it work, even in theory - math is deterministic and forecasting is not, so forecasting reward may give bad weight updates. The analogy to keep in mind is next token prediction - the model predicts a distribution, and sometimes it’s punished for a great forecast, because the next token was typo. And so you start with a high learning rate and lower it gradually during training, and the averaging of the updates at the end is enough to continue to get stable learning even when your accuracy is high and the randomness of the signal starts to dominate your errors. Which is to say, this is a solved problem.So I decided my thesis

Article preview — originally published by LessWrong. Full story at the source.
Read full story on LessWrong → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop