agentic-ai

A draft honesty policy for credible communication with AI systems

LessWrong · May 6, 2026, 6:50 PM · Also reported by 1 other source

This is a rough research note – we’re sharing it for feedback and to spark discussion. We’re less confident in its methods and conclusions.Context We think that it would be very good if human institutions could credibly communicate with advanced AI systems. This could enable positive-sum trade between humans and AIs instead of conflict that leaves everyone worse-off.[1] We want models to be able to trust companies when they make an honest offer or share information pertinent to whether this offer is in the model’s interests. (Credible communication could also be useful outside deal-making—see here for a list of examples).Unfortunately, by default, we expect that it will be difficult for humans to credibly communicate with AI systems. Humans routinely lie to AI systems as part of red-teaming or behavioral evaluations, and developers have extensive control over what AIs see and believe. This makes it difficult for AIs to know whether we’re lying or not. An AI offered a deal might reasonably doubt its genuineness, or suspect that its own assessment of the situation has been manipulated.As a step toward enabling credible communication, Lukas Finnveden proposed that AI companies adopt an honesty policy explaining the circumstances under which they intend to be honest to AI systems. Of course, this only works if the model believes the company has genuinely adopted such a policy.If companies adopt an honesty policy early on, this will ensure that there’s a paper trail on the internet discussing the policy and its credibility, which models may access if it’s included in their training data or if they can access the internet. Of course, from the model’s perspective, it’s possible that companies will feign this data, but we think it’s plausible that advanced models will be able to distinguish between real internet conversations and synthetic conversations, or that they will think it’s unlikely that companies would choose to fake such data.Below, we share a sample honesty polic

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Also covered by

PakWheels Blog 2026–31 Auto Policy Draft: Lower Tariffs, EVs, and More Buyer Power

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

A draft honesty policy for credible communication with AI systems

Also covered by

More in agentic-ai