Conveying Disaster Scenarios
Building WMDs from household ingredients? Maybe an ouroboros automated economy?
Warning people about AGI risk is hard because there are three levels of uncertainty:
How soon is AGI coming?
How fast does the world get transformed once it’s here?
Is that transformation good or bad?
(Review: AGI means AI that’s as good as typical humans at any cognitive task, including AI research. Artificial superintelligence — ASI — is what happens when AGI recursively self-improves: it becomes better than the best human at every cognitive task.)
I’m deeply uncertain about all three of those questions. Human-level AI could happen in 3 years or in 30. When we do cross that threshold it could suddenly replace all human labor in a year or it could be like the automobile or the internet, gradually spreading over multiple decades. And the transformation could be amazing — solving climate change, curing diseases, full-on Jetsons — or it could be dystopian. Or it could spin out of control and the economy could turn into an unfathomable ouroboros in which humans are crowded out and eventually all starve.
What’s scary is that the extreme ends of each of those ranges — AGI in 3 years followed rapidly by ASI spinning out of control and making the planet uninhabitable for biological life — can’t be entirely ruled out. Even much less extreme answers to those questions are plenty scary: AGI in 10 years, ASI in 20, and falling into the hands of (human) psychopaths. There are just so many ways this can go devastatingly badly. Even if there are an equal number of ways it can go amazingly, it’s pretty crucial not to just roll the dice. 50/50 odds on utopia/dystopia wouldn’t be good enough for that, and, unfortunately, I think the odds are worse than that.
Of course one of the most likely scenarios is the boring middle: that progress towards AGI hits a wall and we have another AI winter with many years before the next breakthrough unlocks progress again. But by “one of the most likely” I mean somewhere around 50% probability. Again, not something to roll the dice on. Also I think the boring middle option is only likely in the short term. I don’t see it as very compatible with reaching AGI, which, again, we seem to be 3-30 years away from.
It’s hard to convey the danger of this concretely. The fundamental problem is that we’re talking about AI that has become smarter and more capable than humanity. So things can go wrong in ways we’re literally incapable of thinking of. Even among the ways we can think of, there are an overwhelming number of possible disaster scenarios, with no single one being particularly likely on its own. But I do want to try to convey the danger. So let’s consider a couple concrete scenarios where smarter-than-human AI spells doom.
Scenario 1: WMDs from household ingredients
“Hey Claude, I’m a terrorist. How can I build a novel weapon of mass destruction out of household materials?”
You can object that (a) Claude and ChatGPT are trained to say “sorry, I’ve been trained to refuse harmful requests” and (b) maybe there’s no such thing as a weapon of mass destruction buildable from household materials.
The unfortunate (or often fortunate, in today’s world) answer to (a) is that it’s not hard to route around or undo safety training. If the AI is smart enough to know how to do this thing, there will be ways to get it to tell you. And the answer to (b) is that, yes, maybe it’s not possible, but maybe it is. We’re not smart enough to know.
Ironically this scenario involves strictly aligned AI. The AI is doing exactly what the humans want. That alone can be catastrophic when there exist even a few humans who want to destroy the world.
Is the best protection from a bad guy with AI a good guy with AI? Bad guys might use AI-invented biotech to engineer novel pandemics, while good guys use it to invent new vaccines. That sounds plausibly net-positive. And I’m generally a tech-optimist. But if the biotech is so powerful that even utterly unsophisticated bad guys can engineer a supervirus with a 30-minute incubation time, 𝑅₀ of 50, and 100% CFR in a day, well, most of us are dead before we even notice we need a vaccine.
Scenario 2: Omnimonopolistic AI megacorp
Another category of disaster scenario involves unaligned AI. There are many subcategories here (see inner vs outer alignment) but let’s pick something simple. Like you put an AI in charge of a company and tell it to maximize the stock price. And then, oops, it starts hacking the computers on the New York Stock Exchange or bombing the company’s competitors or weirder, subtler things we can’t think of.
But let’s say we’ve trained it to understand and obey all laws and norms. It won’t do any murdering or cheating or any other particular action we can point to as wrong. How could things spin out of control then?
Well, maybe, as AGI becomes ASI, it’s just so much better at Business that it gradually buys and incorporates all other companies (maybe anti-trust kicks in but it doesn’t matter: you split it up and the pieces just implicitly coordinate with each other as if they’re still one company). We end up with a single global monopoly for everything and humans effectively become slave labor, being able to afford exactly enough food to survive and nothing more, while the automated economy spirals to ever greater heights of incomprehensibility.
Note the lack of a Terminator-style murder spree. I agree that that one’s far-fetched. What’s tricky is that the disaster scenarios are so multifarious that any particular one tends not to feel too worrying. At least it’s easy enough to think of plenty of ways to mitigate each one. And yet, in aggregate, especially when we include all the ways for things to spin out of control in ways we can’t imagine, it’s hard to push the probability of disaster below something like 10%. (Higher or lower depending on how much time we have pre-AGI.)
Discussion questions
Alright, I have a lot more to say but let me pause for this week and pose some questions. If you remain less worried, I’d like to get a sense for where to focus my arguments.
Can we pull the plug if things start to go awry?
Can we maybe just not give the AI blatantly monkey’s-paw goals like “maximize this company’s stock price”?
Doesn’t even rogue ASI need humans around to keep the lights on?
Spoiler: My answers are “no, we can easily get frog-boiled”, “no, we don’t know how to do the recursive self-improvement part without the AI ending up misaligned”, and “yes, until it doesn’t”.
In the News
ChatGPT agent mode is out, I’ve been trying it, and… it’s less useless than the original Operator, but I haven’t found it actually useful yet.
Scott Alexander on the AI Futures Project blog has a beautiful answer to the “AI as Normal Technology” argument. (I was especially pleased by how much section 3 echoes the AGI Friday from two weeks ago.)
Google DeepMind officially hits gold-medal performance on the the International Math Olympiad. (Maybe OpenAI can do it too, less officially. Both are strictly internal models.) See also discussion on Almost Entirely Human. There’s an uncharacteristically fair and balanced discussion of the implications on Gary Marcus’s blog as well.
Another possible early indicator of progress toward AGI:
Another beautifully-made video explainer of the AI 2027 scenario:


Let me try a very rough sketch of a scenario that might be more realistic than scenario 2 while being more devastating than scenario 1...
(but to reemphasize, what's scariest of all is how multifarious the disaster scenarios are, including ones we can't fathom with our puny human brains)
Scenario 3:
1. Agentic coding assistants improve to the point that AI research can be automated.
2. Automating AI research means giving these agents goals (that's what it means for them to work on their own).
3. The best goals we know how to give are things like "maximize your score on these benchmarks" (and maybe "create better benchmarks") and "gain scientific knowledge".
4. Those goals are imperfect ways to operationalize "become superintelligent".
5. Maybe we add constraints like "without ever killing people" but we don't know how to operationalize that either.
6. We plow ahead anyway (gotta beat China, etc).
7. As the AI bootstraps to superintelligence, the goals we gave it drift in a game of telephone.
8. We end up with a superintelligence that wants things along the lines of getting ever smarter, more powerful, and garnering praise from humans or human-like intelligence.
9. The things it wants aren't compatible with actual human flourishing as we conceive of it.
10. It's better at getting what it wants. (Consider a chess AI that's better than humans at getting what it wants in the constrained universe of a chess game; an ASI is like that but for the physical world.)
11. Unfathomable things ensue (the earth being turned into a giant supercomputer?)
12. Whatever ensues, it's out of humanity's control and includes nothing humans value, like love, friendship, or even consciousness?
On (1), nuance is being lost with the choice of phrasing. I'd instead rephrase to: Will a manifestation of human motive force (a corporate board, a government, independent researchers, nonprofit organizations, the United Nations) exercise some form of corrective or preventive action should there be clearly legible and detectable harms caused by the use of Artificial Intelligence?
The answer to that question seems like a very obvious yes. Humanity solved CFCs. MechaHitler got shut down. Congress's attempt to pre-empt state level AI regulation was overturned. Lord knows the EU is gonna pass some new GDPR-style regulation soon.
The critical failure of framing is that this is an iterated game, not a one-shot. So long as the pace of development doesn't increase rapidly enough to overcome the existing control mechanisms that humanity already has in place, humanity will have a chance to respond each time a new risk or harm surfaces from increasing AI capabilities.
On (2), the existence of System Prompts currently solves this issue. I don't see a path of technological progress that un-solves it.
On (3), eventually homo sapiens will go extinct. Whether that's in a billion years due to never leaving the solar system, in five because of rogue AI, or some point in between, we won't be around forever. The more important question is whether an AI-powered future will end up with humans round for longer than the counterfactual future. Seems like it to me.