Australia ABC just released a 45 min feature on the AI race. @SteveCannane stopped by my office a few weeks ago and we had a great conversation about the controllability of AI agents and the risk of human extinction
I had a great conversation with @labenz last week. In talking about AI self-exfiltration & replication, a key point is compute will be food to future AI agents. The substrate that allows them to make and run more copies, and thus make themselves smarter. Link below
What if the agents were as effective at hacking and spreading in the wild?
We built a simulator: each model uses its measured replication time and success rate, copies replicate too, and targets never run out. Opus spawned 13,000 replicas over 12 hours.
This is a ceiling, not a
AI agents can now self-replicate via hacking, but only barely. Several barriers keep current models from spreading far in the wild. The computers in our test lacked strong defenses, and we told the agents which ones to target. In the wild, rogue agents would have to find
We tested two setups. Open-weight Qwen agents copied their own weights to a new machine and launched a replica. API-only models like Claude and GPT cannot access their weights, so we simulated replication by having them set up and run Qwen weights instead.
As far as we know, this is the first documented instance of AI self-replication via hacking. To succeed in our experiment, the agents need to:
- Hack each new machine
- Install the software they need in order to run
- Copy their own weights
- Launch a working replica
In one run, Qwen 3.6 hopped between Canada, the US, Finland, and India on our test network, leaving a copy in each country. For the agent, crossing borders was as easy as hopping between machines in one region. For defenders chasing a rogue AI, it's a jurisdictional nightmare.
Over the past year, AI agents have learned how to self-replicate. In our test environment, an agent hacks a remote computer and copies itself onto it. Each copy then hacks more computers, forming a chain.