agentic-ai

Learnings from starting an AI safety research team

LessWrong · Jun 5, 2026, 4:27 PM · Also reported by 2 other sources

This post’s goal is to distill our takeaways from building a research team (somewhat) from scratch over the past four months. We describe some context about our team, how it came about, and then provide some lessons learned.Since AI safety is becoming more and more entrepreneurial, we hope this is helpful for others trying to do the same.1. The team We're a new alignment research team within Arcadia Impact, based in London. We’re a team of 8, working closely with members of the UK AISI alignment team. We currently have three main projects:Understanding model motivations. This currently looks like:Trying to generate documents which fully describe a model’s behaviour (given just its behaviour).Producing a open analysis of alignment training techniques and ways this training could go wrong.Doing scalable oversight for alignment. This includes validating debate protocols in practice and then trying to apply them to fuzzy alignment-relevant tasks.Building pipelines for doing automated alignment research.We're also hiring for two roles! More on this at the bottom.2. Context about how the team came aboutThe rest of this post is written from the perspective of Andrew Draganov (research lead & current programme manager on the team) and Erin Robertson (co-director of Arcadia).In short, Arcadia Impact had been collaborating with AISI already, through LASR Labs and ASET. Our alignment team started by applying for the AISI alignment project funding, saying that we would hire a team of researchers to collaborate with their alignment team. Andrew was taking part in LASR at the time and was brought in to help with the application. His remit then widened as the number of things to do kept growing. Once our AISI funding was approved we began the process of hiring researchers, and also applied to Coefficient Giving for additional compute funding.A bit about Andrew, since it bears on how replicable this is. In his words: I have a PhD in computer science/machine learning and was working

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Also covered by

STAT News Tiny HHS office tasked with protecting research participants’ safety is running on fumes STAT News Tiny HHS office tasked with protecting research participants’ safety is running on fumes

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Learnings from starting an AI safety research team

Also covered by

More in agentic-ai