Scoopfeeds — Intelligent news, curated.
How Shopify built an AI stack that doesn't care which models survive
ai

How Shopify built an AI stack that doesn't care which models survive

VentureBeat AI · Jun 24, 2026, 5:10 PM

Why this matters: a development in AI with implications for how people work, create, and decide.

Shopify built an LLM proxy that gives every engineer access to multiple AI providers — with automatic failover when any one of them goes down, changes, or disappears. When Claude Fable 5 shut down, Shopify's engineers didn't go into panic mode. The proxy shifted them to Claude Opus or GPT 5.5 automatically, without interrupting their workflows. “Fable looks amazing; we used it of course,” Farhan Thawar, Shopify’s head of engineering, says in a new Venture Beat Beyond the Pilot podcast. “When a model comes and then it goes, or it could be as innocuous as an update, the proxy allows us to spray across the different providers,” Thawar says. Shopify buys tokens in bulk and all users connect to models through its proxy, Thawar says. This gives his team access to reporting and failover; when there’s an availability issue with one provider, users can be “automatically, seamlessly” transferred to another. Enterprises can learn from this example and consider how a disruption might affect their business, Thawar says. At the very least, they should establish a solid backup plan. It’s important to have a system that allows for movement across models so enterprises are not “super tied” to a specific provider. Distillation is another important strategy. With distillation, a student model learns from a teacher model and typically becomes specialized in a narrower task. These small language models (SLMs) can be more beneficial than generalized, off-the-shelf models in some circumstances. For instance, Shopify’s flagship AI assistant, Sidekick, which performs numerous specialized subtasks for merchants so they can “remove toil” from their day-to-day. Using smaller distilled models can be faster and cheaper than more generalized models, Thawar says. In some cases they have proven to be 2x cheaper and faster; in more extreme cases 30x cheaper and faster, he says. But “it isn’t just about cost and latency, which are big; it’s about accuracy,” Thawar says. Engineers feed the UD

Article preview — originally published by VentureBeat AI. Full story at the source.
Read full story on VentureBeat AI → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop