Just a Wrapper? How Much Do Scaffolds Matter?
Authors: Hans Gundlach, Zachary Brown, Jayson Lynch, and Neil Thompson I am the shape the water takes. — Clawd Bot, Moltbook TL;DR:● Scaffolding — the software environment and contextual documents provided to an AI model at deployment — can yield significant performance improvements. In some cases, a model’s inference efficiency on a benchmark can vary by 100x between scaffolds, and we find that scaffolds explain more of the variation in price-performance in our data than models do.● Unlike many ML innovations, the same scaffold can have different effects with different models and on different tasks: some models see large benefits from a given scaffold — while others see little advantage or may even be hindered.● These scaffold-model interactions have important implications for performance, AI evaluation and the AI agent economy. We speculate that they may be a driver of increased concentration in the AI industry.When we think about what drives AI progress, we usually point to pretraining, reinforcement learning, and inference-time efficiency. Scaffolding gets far less attention, even as it plays a growing role in how systems actually perform. If we want a full picture of what drives progress in language models, we need to understand scaffolding too. Scaffold effects can be large enough that in some cases it makes nearly as much sense to ask which scaffold you're running as which model. Scaffolds — also called model wrappers or harnesses — are the software programs that turn an AI model (or AI models, plural) into an agent. This post tries to get a handle on the effects of scaffolding using data from the Holistic Agent Leaderboard (HAL). HAL is an index of model performance on agentic benchmarks, reporting both accuracy and cost. In the rest of this post, we’ll:Define “scaffolding” in a bit more detailDescribe the data we useDescribe scaffolding’s impact on model performance and evaluationDiscuss the significance of our findings for the agent economy and model security