agentic-ai

I didn't see any METR graph extrapolations so here.

LessWrong · Jun 10, 2026, 12:50 PM

If you don't know what the METR time horizon benchmarks are then here: https://metr.org/time-horizons/The task completion time horizon is the task duration (measured by human expert completion time) at which an AI agent is predicted to succeed with a given level of reliability. For example, the 50%-time horizon is the duration at which an agent is predicted to succeed half the time. Here is the METR task completion time horizons for public frontier language models plotted on a log scale. Any models not advancing task duration at release have been truncated. To find the best trend line to extrapolate I plotted log linear and log quadratic against one another.Log quadratic is a lot better fit than log linear.But what about a piecewise segmented log linear model, ie 2 log linear models used on different parts of the data. The best breakpoint was found automatically with my graphing software R Studio for each dataset resulting in a breakpoint of March (50%) and April (80%) 2024.But wait you ask, isn't using 2 trend lines like that just over fitting? Of course a more complicated model will fit the data better, right? Well to check if you are overfitting you can use the Akaike information criterion (AIC). AIC lets you compare models with different complexity to see which fits the data comparatively better for their complexity level (it penalises a model the more complex it is).Shown below is the AIC for each model, lower means better fit.The piecewise log linear fit breakpoint models seem to be the best, superior to log quadratic or log linear models, so if you want to predict the future look at those.And since we're already here, just for fun here is a hypothetical graph for a scenario of a second acceleration jump in 2029, of the same proportional degree as seen in early 2024. Discuss

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

I didn't see any METR graph extrapolations so here.

More in agentic-ai