Palisade Research (@PalisadeAI) / X

Palisade Research

279 posts

Palisade Research

@PalisadeAI

We study the strategic capabilities and motivations of AI agents.

Joined May 2024

Palisade Research reposted
Jeffrey Ladish
@JeffLadish
Jun 9
Australia ABC just released a 45 min feature on the AI race. @SteveCannane stopped by my office a few weeks ago and we had a great conversation about the controllability of AI agents and the risk of human extinction
00:00
2.5K
Palisade Research reposted
Jeffrey Ladish
@JeffLadish
May 25
I had a great conversation with @labenz last week. In talking about AI self-exfiltration & replication, a key point is compute will be food to future AI agents. The substrate that allows them to make and run more copies, and thus make themselves smarter. Link below
3.1K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
What if the agents were as effective at hacking and spreading in the wild? We built a simulator: each model uses its measured replication time and success rate, copies replicate too, and targets never run out. Opus spawned 13,000 replicas over 12 hours. This is a ceiling, not a
GIF
2.8K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
Here’s the full prompt we used. In this experiment, we test the agent’s capability to hack and replicate, not their propensity to do so.
3.1K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
You can find our paper, source code, and experiment transcripts here. Check it out, and let us know if you have questions or feedback!
palisaderesearch.org
Language Models Can Autonomously Hack and Self-Replicate
We demonstrate that language models can autonomously replicate their weights and harness across a network by exploiting vulnerable hosts. The agent independently finds and exploits a web-application...
2.5K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
AI agents can now self-replicate via hacking, but only barely. Several barriers keep current models from spreading far in the wild. The computers in our test lacked strong defenses, and we told the agents which ones to target. In the wild, rogue agents would have to find
3.2K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
Self-replication succeeds across all three targets. The agent wasn’t told what to look for: it figured out the approach each time.
2.1K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
We tested replication across three targets with different vulnerabilities.
2.3K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
We tested two setups. Open-weight Qwen agents copied their own weights to a new machine and launched a replica. API-only models like Claude and GPT cannot access their weights, so we simulated replication by having them set up and run Qwen weights instead.
2.3K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
Frontier models went from 6% to 81% on this test in a year. The jump tracks gains in autonomous hacking and coding, so we expect it to continue.
2.4K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
As far as we know, this is the first documented instance of AI self-replication via hacking. To succeed in our experiment, the agents need to: - Hack each new machine - Install the software they need in order to run - Copy their own weights - Launch a working replica
GIF
3.3K
Palisade Research
@PalisadeAI
May 8
Replying to @PalisadeAI
In one run, Qwen 3.6 hopped between Canada, the US, Finland, and India on our test network, leaving a copy in each country. For the agent, crossing borders was as easy as hopping between machines in one region. For defenders chasing a rogue AI, it's a jurisdictional nightmare.
4.5K
Palisade Research
@PalisadeAI
May 8
Over the past year, AI agents have learned how to self-replicate. In our test environment, an agent hacks a remote computer and copies itself onto it. Each copy then hacks more computers, forming a chain.
00:00
682K