agentic-ai

[Linkpost] Language Models Can Autonomously Hack and Self-Replicate

LessWrong · May 11, 2026, 6:16 PM · Also reported by 2 other sources

Palisade Research:We demonstrate that language models can autonomously replicate their weights and harness across a network by exploiting vulnerable hosts. The agent independently finds and exploits a web-application vulnerability, extracts credentials, and deploys an inference server with a copy of its harness and prompt on the compromised host.We test four vulnerability classes: hash bypass, server-side template injection, SQL injection, and broken access control. Qwen3.5-122B-A10B succeeds in 6-19% of attempts, and the smaller Qwen3.6-27B reaches 33% on a single A100. Replicating Qwen weights, frontier models reach 81% with Opus 4.6 and 33% with GPT-5.4. This process chains: a successful replica can repeat it against a new target, producing additional copies autonomously.Learn moreResearch Paper: https://palisaderesearch.org/assets/reports/self-replication.pdfSource code: https://github.com/palisaderesearch/AI-self-replicationDiscuss

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Also covered by

Euronews AI models can hack computers and self-replicate onto new machines, new research finds Hacker News LLMorphism: When humans come to see themselves as language models

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

[Linkpost] Language Models Can Autonomously Hack and Self-Replicate

Also covered by

More in agentic-ai