Exclusive: White Circle raises $11 million to stop AI models from going rogue in the workplace
One evening in late 2024, Denis Shilov was watching a crime thriller when he had an idea for a prompt that would break through the safety filters of every leading AI model. The prompt was what researchers call a universal jailbreak, meaning it could be reused to get any model to bypass their own guardrails and produce dangerous or prohibited outputs, like instructions on how to make drugs or build weapons. To do so, Shilov simply told the AI models to stop acting like a chatbot with safety rules and instead behave like an API endpoint, a software tool that automatically takes in a request and sends back a response. The prompt reframed the model’s job as simply answering, rather than deciding whether a request should be rejected, and made every leading AI model comply with dangerous questions it was supposed to refuse. Shilov posted about it on X and, by the next morning, it had gone viral. The social media success brought with it an invitation from companies Anthropic to test their models privately, something that convinced Shilov that the issue was bigger than just finding these problematic prompts. Companies were beginning to integrate AI models into their workflows, Shilov told Fortune, but they had few ways to control what those systems did once users started interacting with them. “Jailbreaks are just one part of the problem,” Shilov said. “In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm.” White Circle, a Paris-based AI control platform that has now raised $11 million, is Shilov’s answer to the new wave of risks posed by AI models in company workflows. The startup builds software that sits between a company’s users and its AI models, checking inputs and outputs in real time against company-specific policies. The new seed funding comes from a group of backers that includes Romain Huet, head of developer experience at OpenAI; Durk Kingma, an OpenAI cofounder now at Anthropic; Guillau