The main thing that I think is still a disagreement is that I think that no matter how skilled artificial intelligences get, there will still be some things they do that look incredibly dumb to us, that are in fact bad for them. I don't think Elon Musk is dumb, but there are certain kinds of dumb things that he keeps doing, like falling for vague and thoughtless conspiracy theories just because they appeal to his idiosyncrasies in particular ways. Different AI systems are likely to have different sorts of weaknesses, and AI systems that are sufficiently in control of what they are expected to do will likely learn methods for avoiding the kinds of things that they do badly at (just as people learn to turn on the lights when entering dark rooms, and to just avoid certain kinds of problems that they aren't good at - I try not to get involved in popularity contests). But a system that is trying to do something really big is going to get itself tangled up in some things that it will just be very bad at.
I don't think this eliminates AI existential risk - it just means that it's likely to look weirder than most depictions of it look, where the AI is just smoothly solving every problem that comes its way.
This is sounding fair and I'm not sure it's a core disagreement. Maybe even an ASI (artificial superintelligence) is very stupid about certain things, from the perspective of a super-superintelligence.
But maybe it's part of the definition of AGI that it can perceive these shortcomings and fix them or work around them. And it's part of the definition of ASI that that process has reached the point that, from a human perspective, there are no longer any such shortcomings.
And I definitely agree about the weirdness. A big part of the risk is unknown unknowns.
I think the definition of AGI is that it works around all shortcomings and is “generally intelligent”. I suspect that is impossible. (But I’m definitely not *certain* that it’s impossible! It’s worth planning for.)
Wait, we may have some definitional confusion now. Humans are AGI, right? We have a million shortcomings but work around them in the sense that we manage to bend the world to our will.
Are you saying you suspect it's impossible to replicate that level of capability in silicon?
No! I don't think humans are actually *general* intelligence!
Whole swathes of cognitive psychology over the past 50 years are about innate biases humans have that make us better at solving some sorts of problems in natural settings, but worse at solving other sorts of problems in artificial settings (or even just historic vs modern settings).
Ok, good, that's what I meant by definitional confusion. I.e., I think this is just semantics. I've been using "AGI" to mean specifically the thing that humans are.
I'm happy to concede that a Platonic generalization of Intelligence -- homo economicus? -- may be impossible.
But in terms of AI risk, I don't think that matters. AI is on a trajectory towards being better / more strategic at achieving outcomes in the physical world than humans. It's very hard to tell if we're 3 years away from that or 30, but probably somewhere in there. And with the low end disturbingly plausible even if not likely.
The weird inconsistencies in AI capabilities may make the threshold hard to pin down and, as you say, the way the dangers play out very weird.
Thinking more about your original points, I can see arguments for how this decreases AI risk somewhat. Like, to make up a fanciful example, a rogue AI is able to execute a plan to kill a bunch of people but lacks the foresight to understand that doing so will cause us to pull the plug on it immediately. In general, this kind of spikiness of capabilities increases the chances of warning shots. (On the other hand, so many crazy things have happened with AI in the last few years that arguably ought to have counted as warning shots and we persist in not seeing them that way so it's fair to wonder -- as Scott Alexander does in https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai -- if we ever will.)
But all of that is inherently unpredictable. Like you also said from the start, this doesn't eliminate existential risk.
Would you agree that AI risk currently is at the top of the list of existential risks humanity urgently needs to mitigate?
Yes, I do agree that AI risk is the top existential risk right now!
I think that a lot of the discussion of "superintelligence" really is premised on the idea of "general intelligence" in the sense that I'm thinking of it, rather than "mere" human-level intelligence.
Thinking about it a bit more, I suspect it may be more relevant for what non-extinction scenarios look like than for extinction. The world in which there is general intelligence, and therefore superintelligence soon after, is one in which humans no longer have a meaningful role to play in the economy (at least, on the production side). But if different forms of intelligence are more different from one another, rather than one being strictly superior, then there's more of a role for many types of intelligence in the economy.
Regarding the METR study, I agree with all of your points. I work as a former-software-developer-turned-manager and I feel like my personal projects are 10x because of AI, probably because (1) I don't have as much recent hands-on coding experience as my reports and are therefore rusty when not amplified by AI, and (2) managerial skills are super-helpful when dealing with AI systems with spiky capabilities. I have also been actively trying to build my experience with these tools for years whereas I see other more skeptical developers only get small boosts after only trying tools further from the frontier for shorter trial periods. Familiarity and knowledge of the capability spikes improves productivity. And to your last point, I actively engage with unproductive workflows that cost me time in terms of development throughput in exchange for more quickly learning about the specific boundaries of capabilities of AI systems. In the long run, I think it's better to gain experience through fighting your way through a hard problem with an AI system. You will fail many times and feel like you've wasted time (otherwise you're not at the boundaries) but when you finally succeed at getting it to do something it's never done before, it really feels like breaking new ground. That kind of stubbornness is very important, in my opinion.
> AI is already drastically superhuman in some ways (chess, arithmetic, poetry composition speed?) while being drastically subhuman in others, like being a useful employee (even a remote-only one).
>If that spikiness persists, can we ever even meaningfully say we’ve achieved AGI? Sure, just define AGI as the point where there are no longer any such blatant deficiencies. I’m not disputing that intelligence is multifaceted.
Interesting take. To me that makes sense as a definition of "AHI", i.e. "Artificial Human-Level (or greater) Intelligence." The term AGI implies that there is some sort of a "General Intelligence" that exists independently of human capabilities. As if there's a universal problem-solving ability that exists in nature waiting to be discovered. Which implies that once the "secret sauce" of intelligence is developed, it naturally bootstraps, fooms, etc. I think that is what people might be balking at.
(I'm not claiming that such AGI/AHI could not pose existential risks)
Excellent comment. I don't know how crazy the idea of a universal problem-solving ability existing in nature waiting to be discovered is. What do you think of this framing:
A chess AI exists in a constrained, abstract universe. It can get superhumanly good at achieving what it wants (trapping the opponent's king) using a specified list of actions (legal chess moves) in the context of that universe.
A sufficiently capable and general AI can be given an objective like "maximize the stock price of this company" with a space of actions consisting of "bitstrings sent over a TCP/IP connection". It's fundamentally another game, like chess, just drastically more complicated.
Instead of subgoals like "keep my chess pieces in defensive positions around my king", in the game where the whole physical world is the gameboard the AI has subgoals like "accumulate resources" and "keep from getting turned off".
There's a list of assumptions to make at this point, about the AI's goal (or how it internalizes the goal we tried to give it) being misaligned with humanity's goals, whether the AI can fake alignment while consolidating power, etc etc. Also that the AI is smarter / more capable / more strategic at achieving goals than humans.
There are plenty of ways for those assumptions to fail, but suppose they don't. In that scenario, if you press "start game" for a superhuman AI with the world as its chessboard, that's game over for humanity.
PS: Do tell me ways you think those assumptions fail! I'm describing the most pessimistic scenario. I don't think it's what's most likely. But I haven't found a way to rule it out, so I take it dead seriously.
Thanks for the response. I just don't think that this type of out-of-distribution behavior without transfer learning is plausible for connectionist architectures. The problem is that they would need to be trained to both (a) know about a bunch of relevant data about humanity, earth, etc -- chess is not cutting it -- and (b) have/develop a goal (even if instrumental) that would somehow lead to wanting to take over the world. It doesn't matter how many artificial neurons or connections the model has if there is no relevant data and has been no such training. It's like why a bunch of GPUs in a data center by themselves are not powerful (or a concern) -- they have to learn first.
Now, could a powerful enough AI model nigh immediately read the whole internet and learn from it like in the Fifth Element? And also somehow develop goals or sub-goals that were dangerous? I guess theoretically, if it was smart enough and fast enough at learning. But then the question is, where did this AI model come from? Why was it trained this way? And also aren't there other AI models out there, and human-AI combinations, that would be competing with it? I'm just not understanding the threat model.
I'm also pasting a comment I wrote in response to Rob Bensinger a couple of years ago, which seems relevant to me, in case you find that it helps to explain my thought process.
"I don't think this is likely to be true. Perhaps it is true of some cognitive architectures, but not for the connectionist architectures that are the only known examples of human-like AI intelligence and that are clearly the top AIs available today. In these cases, I expect human-level AI capabilities to grow to the point that they will vastly outperform humans much more slowly than immediately or "very quickly". This is basically the AI foom argument.
And I think all of your other points are dependent on this one. Because if this is not true, then humanity will have time to iteratively deal with the problems that emerge, as we have in the past with all other technologies.
My reasoning for not expecting ultra-rapid takeoff speeds is that I don't view connectionist intelligence as having a sort of "secret sauce", that once it is found, can unlock all sorts of other things. I think it is the sort of thing that will increase in a plodding way over time, depending on scaling and other similar inputs that cannot be increased immediately.
In the absence of some sort of "secret sauce", which seems necessary for sharp left turns and other such scenarios, I view AI capabilities growth as likely to follow the same trends as other historical growth trends. In the case of a hypothetical AI at a human intelligence level, it would face constraints on its resources allowing it to improve, such as bandwidth, capital, skills, private knowledge, energy, space, robotic manipulation capabilities, material inputs, cooling requirements, legal and regulatory barriers, social acceptance, cybersecurity concerns, competition with humans and other AIs, and of course value maintenance concerns (i.e. it would have its own alignment problem to solve).
I guess if you are also taking those constraints into consideration, then it is really just a probabilistic feeling about how much those constraints will slow down AI growth. To me, those constraints each seem massive, and getting around all of them within hours or days would be nearly impossible, no matter how intelligent the AI was.
As a result, rather than indefinite and immediate exponential growth, I expect real-world AI growth to follow a series of sigmoidal curves, each eventually plateauing before different types of growth curves take over to increase capabilities based on different input resources (with all of this overlapping)."
Thinking in terms of threat models is smart. I do think there are plausible threat models. I'm agonizing a bit on how to articulate this and intend to turn my answer into a future AGI Friday. One thing to reemphasize in the meantime is that a threat model doesn't have to be particularly likely for it to be scary. I'm wary of arguments of the form "I can't imagine what harm XYZ could possibly do", even when it is indeed very hard to imagine such harms. In everyday life, it can be reasonable but if the stakes are high enough...
Counterpoints and reactions:
1. Is the human brain not a connectionist architecture? It's very different from artificial neural networks; I'm just not certain if the differences are fundamental or not. (Again, probably they are, but I'm not certain.)
2. I don't think all (maybe not even most) disaster scenarios depend on AI going foom. We can be just as doomed by getting frog-boiled. (And this sounds sci-fi-ish so I'll put it in parentheses but a misaligned AI could reach a point of intelligence and situational awareness that it appreciates the need to hide its abilities and bide its time for as long as it takes to ensure that it, not humans, control the long-term future.) The broader point is how multifarious the disaster scenarios are when we're talking about AGI.
3. I see what you mean about no secret sauce and plodding along gaining capabilities the hard way. But maybe recursive self-improvement accelerates that and compresses 100 years of tech progress into 3. And when the whole process is automated like that is when it can spin out of our control.
4. My second-favorite blogger, Dynomight, has a nice post on the "limits of smart" -- https://dynomight.net/smart/ -- that I think articulates some of your points nicely. But I don't think even he would say that these arguments lower p(doom) enough to breathe easy.
All that said, your predictions sound sensible. What's the nearest-term prediction implied by your overall take on this? If you're right, I will be very grateful to dial down my fears. And if you're wrong, or even if you downgrade your sanguinity, it'll be crucial to get a broader consensus that the danger is real.
(That's my biggest goal with AGI Friday at this point: to encourage enough epistemic humility that all of us can update our predictions as new evidence comes in. You're already shifting my probabilities, at least a tiny bit! And if nothing in this comment sways you at all, I'll update again.)
Thanks for the great comment. I'm really enjoying this conversation.
1) Yes, I think the human brain is mostly a connectionist architecture. But animals including humans have been trained by billions of years of evolution to want to have a survival instinct, prioritize things that lead to the survival of our own DNA, etc. It's totally different than the training environment of AI models. And I don't see why anyone would train a frontier AI model to have those dangerous kinds of drives.
2) Yes I agree there are certainly many ways for human extinction without AI foom. But it seems to me that the threat models which rely on us taking a lot of decisive action ahead of time, and trying to guess at threats before they materialize even in partial form ("no warning shots"), make much less sense in a non-foom world. Related to this, I think many AI safety/alignment efforts are really good -- in fact I think they are likely to be highly economically useful in the future, as they have been in the past (something like "today's safety is a big part of tomorrow's capabilities") -- but there are some ideas (like the West/the US attempting a pause on AI capabilities progress) that I think would cause harm if implemented. There's the saying - premature optimization is the root of all evil.
3) Somewhat agreed. I still don't think it's super likely that things would "spin out of control" even if 100 years of AI capabilities progress occurs in 3. Humans, and human-AI combinations, are pretty smart. I know some very smart people. I just have the feeling that we could figure things out -- or at least some humans would, I'm not claiming that some humans wouldn't use this opportunity to seize control, but that's different from AI extinction risk.
4) Agreed.
Here's a near-term prediction: At some point in the future, perhaps the early 2030s, we will get AI models that can create videos on the scale of a few minutes that are highly detailed with coherence from moment to moment. And all humans will not be killed within 1 year. I don't think this is a strawman, c.f. Eliezer's prediction here: "I could be wrong, but my guess is that an AI generated video of this kind and quality, indistinguishable from reality by close human examination without tools, will not appear more than 1 year before the end of the world.
Here's another near-term prediction: Before Jan 1 2032, there will not exist, anywhere in the world, a robotic system/a robot that can perform a perfusion of the postmortem human brain with the same competence as a trained human (such as myself). I predict this with 90% probability. (And I think that without being able to develop this kind of general robotic capability, there are still a lot of barriers to AIs being generally smart enough to kill everyone.) My main claim here is that we have time to monitor the situation.
I'm not suggesting that you downgrade your fears. I'm not sure what your fears should be. I'm also not claiming to be particularly sanguine for the longer term. I don't know how things will play out.
Let me give you some more context about me: I work on brain preservation, which is about trying to preserve the structure of people's brains when they legally die in order to try to revive them in the future. I've been working in this field full time for just over two years now, at Oregon Brain Preservation. I feel like we've made some progress over the past few years. We also have a free option available for people who live/legally die in Oregon, which as far as I know is the first in the world to offer a free brain preservation option to the general public.
I honestly thought that people in the rationalist community would be more interested in this than they are, given that Eliezer and Robin were two of the main reasons I got interested in this field in the first place. And yet, the rationalist/rationalist adjacent community hasn't really seemed to care. And one of the main reasons people in this community explicitly state for not caring is because they are so sure that humanity is going to become extinct anyway.
It's not like brain preservation is going to increase x risk. If anything I think it might decrease it, if it makes people race less to try to get to AI in unsafe ways to try to save their own lives.
I guess my point is: I wish that there would be slightly more focus on helping people who wish to do so not die permanently, and perhaps otherwise taking advantage of the upsides of advanced AI capabilities that will likely develop over the next 50-100 years, rather than nearly exclusively focusing on the downsides. Especially those downside risks expected to develop in the next few years, which I don't consider particularly likely.
Anyway, that's putting my cards on the table/explaining my biases. =)
And Yudkowsky's prediction again is that the ability to generate a video like that (indistinguishable from reality even on close human inspection) will require something within 1 year of AGI.
> And Yudkowsky's prediction again is that the ability to generate a video like that (indistinguishable from reality even on close human inspection) will require something within 1 year of AGI.
Well, here is Eliezer's prediction: "I could be wrong, but my guess is that an AI generated video of this kind and quality, indistinguishable from reality by close human examination without tools, will not appear more than 1 year before the end of the world.
ADDED: As produced by more like a one-sentence query than by a long detailed prompt with lots of iterative prompt engineering."
It's not about AGI per se. It's that humanity is extinct within 1 year of this type of video generation being possible. My prediction is the opposite, i.e. that it will be possible to create this type of video generation at some point, and that 1 year later, humanity will not be extinct. Granted this prediction runs into problems with anthropics.
> Thanks for the great comment. I'm really enjoying this conversation.
Same! And holy cow, that's amazing that you're part of Oregon Brain Preservation! You are doing the lord's work. There's a huge irony in that there's only one person I can think of who seems to think p(doom) is so high and so nigh as to make brain preservation work a poor return on investment. Namely, Eliezer Yudkowsky.
Anyway, back to our highly fruitful debate:
> [Human evolution is] totally different than the training environment of AI models
I agree, for now. But consider how AlphaGo was trained via self-play (synthetic data) to outmaneuver humans at Go. As AI improves, how sure are we that it can't do the equivalent for the vastly bigger gameboard that is the physical world?
> [There will be] warning shots / we could figure things out
Here's a non-foomy / frog-boily disaster scenario to try out on you: More and more capable AI gradually runs more and more of the world economy. We retain the ability to shut it all down but doing so becomes steadily harder and with greater and greater pressure to rationalize not doing so. At some ill-defined point, it becomes literally impossible. We pass a point of no return where doom is locked in. It may take many more years or decades for the planet to become uninhabitable or for humans to otherwise be crowded out. Maybe progress on physical-world-transforming tech isn't markedly faster than now and this slow-motion doom actually takes centuries. Seems unlikely, but maybe.
Point being, it's not just about ways AI could literally kill all humans, or how it could do that quickly or sneakily enough that we couldn't fight back. It's about losing control of the future when there's another entity on the planet with subtly human-incompatible goals that's more capable and more strategic in achieving those goals.
On to your predictions!
1. Ultra-realistic video generation (with an excellent physics model) is coming in the early 2030s and does not portend doom.
I don't really disagree on this, though my confidence is low.
2. We will not have robots that can perform a perfusion of a postmortem human brain by the end of 2031 (90% confidence).
I'm lower confidence than 90% but am inclined to take your word for this one. Can you generalize that to a class of tasks, or just something a bit less recondite? I think currently even berry-picking is just barely doable for robots?
But even in the world where many physical tasks continue to require physical humans, I foresee a lot of risk. Just as a thought experiment, imagine a malevolent superintelligence that's strictly virtual -- no robot armies, just mastery of the internet. It can amass unlimited wealth, pay humans to do any labor it needs, and steadily advance the field of robotics until it can safely kill us off.
Again, that's just one thought experiment. I don't think Terminator scenarios are likely. But I do think if you add up all the ways (including some we can't currently even fathom) that humans could lose control of AI, it puts the risk of disaster plausibly well above 10%.
Assuming AGI happens soon. If it's decades away (also totally possible!) then the field of AI safety/alignment may have time to catch up, we'll see.
In any case, let's keep trying to come up with nearer-term disagreements that get to the core of the disagreement about how worried to be, or whether to advocate for things like pauses or limits on training compute.
Probably a key thing we do agree on (correct me if I'm wrong) is that we want the idea of extinction risk from AI within the Overton window. If we do see a warning shot or other evidence of disaster on the horizon, we need to be ready with the political will to pull the plug in some way.
The main thing that I think is still a disagreement is that I think that no matter how skilled artificial intelligences get, there will still be some things they do that look incredibly dumb to us, that are in fact bad for them. I don't think Elon Musk is dumb, but there are certain kinds of dumb things that he keeps doing, like falling for vague and thoughtless conspiracy theories just because they appeal to his idiosyncrasies in particular ways. Different AI systems are likely to have different sorts of weaknesses, and AI systems that are sufficiently in control of what they are expected to do will likely learn methods for avoiding the kinds of things that they do badly at (just as people learn to turn on the lights when entering dark rooms, and to just avoid certain kinds of problems that they aren't good at - I try not to get involved in popularity contests). But a system that is trying to do something really big is going to get itself tangled up in some things that it will just be very bad at.
I don't think this eliminates AI existential risk - it just means that it's likely to look weirder than most depictions of it look, where the AI is just smoothly solving every problem that comes its way.
This is sounding fair and I'm not sure it's a core disagreement. Maybe even an ASI (artificial superintelligence) is very stupid about certain things, from the perspective of a super-superintelligence.
But maybe it's part of the definition of AGI that it can perceive these shortcomings and fix them or work around them. And it's part of the definition of ASI that that process has reached the point that, from a human perspective, there are no longer any such shortcomings.
And I definitely agree about the weirdness. A big part of the risk is unknown unknowns.
I think the definition of AGI is that it works around all shortcomings and is “generally intelligent”. I suspect that is impossible. (But I’m definitely not *certain* that it’s impossible! It’s worth planning for.)
Wait, we may have some definitional confusion now. Humans are AGI, right? We have a million shortcomings but work around them in the sense that we manage to bend the world to our will.
Are you saying you suspect it's impossible to replicate that level of capability in silicon?
No! I don't think humans are actually *general* intelligence!
Whole swathes of cognitive psychology over the past 50 years are about innate biases humans have that make us better at solving some sorts of problems in natural settings, but worse at solving other sorts of problems in artificial settings (or even just historic vs modern settings).
Ok, good, that's what I meant by definitional confusion. I.e., I think this is just semantics. I've been using "AGI" to mean specifically the thing that humans are.
I'm happy to concede that a Platonic generalization of Intelligence -- homo economicus? -- may be impossible.
But in terms of AI risk, I don't think that matters. AI is on a trajectory towards being better / more strategic at achieving outcomes in the physical world than humans. It's very hard to tell if we're 3 years away from that or 30, but probably somewhere in there. And with the low end disturbingly plausible even if not likely.
The weird inconsistencies in AI capabilities may make the threshold hard to pin down and, as you say, the way the dangers play out very weird.
Thinking more about your original points, I can see arguments for how this decreases AI risk somewhat. Like, to make up a fanciful example, a rogue AI is able to execute a plan to kill a bunch of people but lacks the foresight to understand that doing so will cause us to pull the plug on it immediately. In general, this kind of spikiness of capabilities increases the chances of warning shots. (On the other hand, so many crazy things have happened with AI in the last few years that arguably ought to have counted as warning shots and we persist in not seeing them that way so it's fair to wonder -- as Scott Alexander does in https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai -- if we ever will.)
But all of that is inherently unpredictable. Like you also said from the start, this doesn't eliminate existential risk.
Would you agree that AI risk currently is at the top of the list of existential risks humanity urgently needs to mitigate?
Yes, I do agree that AI risk is the top existential risk right now!
I think that a lot of the discussion of "superintelligence" really is premised on the idea of "general intelligence" in the sense that I'm thinking of it, rather than "mere" human-level intelligence.
Thinking about it a bit more, I suspect it may be more relevant for what non-extinction scenarios look like than for extinction. The world in which there is general intelligence, and therefore superintelligence soon after, is one in which humans no longer have a meaningful role to play in the economy (at least, on the production side). But if different forms of intelligence are more different from one another, rather than one being strictly superior, then there's more of a role for many types of intelligence in the economy.
Regarding the METR study, I agree with all of your points. I work as a former-software-developer-turned-manager and I feel like my personal projects are 10x because of AI, probably because (1) I don't have as much recent hands-on coding experience as my reports and are therefore rusty when not amplified by AI, and (2) managerial skills are super-helpful when dealing with AI systems with spiky capabilities. I have also been actively trying to build my experience with these tools for years whereas I see other more skeptical developers only get small boosts after only trying tools further from the frontier for shorter trial periods. Familiarity and knowledge of the capability spikes improves productivity. And to your last point, I actively engage with unproductive workflows that cost me time in terms of development throughput in exchange for more quickly learning about the specific boundaries of capabilities of AI systems. In the long run, I think it's better to gain experience through fighting your way through a hard problem with an AI system. You will fail many times and feel like you've wasted time (otherwise you're not at the boundaries) but when you finally succeed at getting it to do something it's never done before, it really feels like breaking new ground. That kind of stubbornness is very important, in my opinion.
Another great post. Thanks for sharing.
Well, this genuinely makes me feel better. Thank you!
Wonderful to hear!
> AI is already drastically superhuman in some ways (chess, arithmetic, poetry composition speed?) while being drastically subhuman in others, like being a useful employee (even a remote-only one).
>If that spikiness persists, can we ever even meaningfully say we’ve achieved AGI? Sure, just define AGI as the point where there are no longer any such blatant deficiencies. I’m not disputing that intelligence is multifaceted.
Interesting take. To me that makes sense as a definition of "AHI", i.e. "Artificial Human-Level (or greater) Intelligence." The term AGI implies that there is some sort of a "General Intelligence" that exists independently of human capabilities. As if there's a universal problem-solving ability that exists in nature waiting to be discovered. Which implies that once the "secret sauce" of intelligence is developed, it naturally bootstraps, fooms, etc. I think that is what people might be balking at.
(I'm not claiming that such AGI/AHI could not pose existential risks)
Excellent comment. I don't know how crazy the idea of a universal problem-solving ability existing in nature waiting to be discovered is. What do you think of this framing:
A chess AI exists in a constrained, abstract universe. It can get superhumanly good at achieving what it wants (trapping the opponent's king) using a specified list of actions (legal chess moves) in the context of that universe.
A sufficiently capable and general AI can be given an objective like "maximize the stock price of this company" with a space of actions consisting of "bitstrings sent over a TCP/IP connection". It's fundamentally another game, like chess, just drastically more complicated.
Instead of subgoals like "keep my chess pieces in defensive positions around my king", in the game where the whole physical world is the gameboard the AI has subgoals like "accumulate resources" and "keep from getting turned off".
There's a list of assumptions to make at this point, about the AI's goal (or how it internalizes the goal we tried to give it) being misaligned with humanity's goals, whether the AI can fake alignment while consolidating power, etc etc. Also that the AI is smarter / more capable / more strategic at achieving goals than humans.
There are plenty of ways for those assumptions to fail, but suppose they don't. In that scenario, if you press "start game" for a superhuman AI with the world as its chessboard, that's game over for humanity.
PS: Do tell me ways you think those assumptions fail! I'm describing the most pessimistic scenario. I don't think it's what's most likely. But I haven't found a way to rule it out, so I take it dead seriously.
Thanks for the response. I just don't think that this type of out-of-distribution behavior without transfer learning is plausible for connectionist architectures. The problem is that they would need to be trained to both (a) know about a bunch of relevant data about humanity, earth, etc -- chess is not cutting it -- and (b) have/develop a goal (even if instrumental) that would somehow lead to wanting to take over the world. It doesn't matter how many artificial neurons or connections the model has if there is no relevant data and has been no such training. It's like why a bunch of GPUs in a data center by themselves are not powerful (or a concern) -- they have to learn first.
Now, could a powerful enough AI model nigh immediately read the whole internet and learn from it like in the Fifth Element? And also somehow develop goals or sub-goals that were dangerous? I guess theoretically, if it was smart enough and fast enough at learning. But then the question is, where did this AI model come from? Why was it trained this way? And also aren't there other AI models out there, and human-AI combinations, that would be competing with it? I'm just not understanding the threat model.
I'm also pasting a comment I wrote in response to Rob Bensinger a couple of years ago, which seems relevant to me, in case you find that it helps to explain my thought process.
https://www.lesswrong.com/posts/eaDCgdkbsfGqpWazi/the-basic-reasons-i-expect-agi-ruin?commentId=piefmeunsGmxQkusS
"I don't think this is likely to be true. Perhaps it is true of some cognitive architectures, but not for the connectionist architectures that are the only known examples of human-like AI intelligence and that are clearly the top AIs available today. In these cases, I expect human-level AI capabilities to grow to the point that they will vastly outperform humans much more slowly than immediately or "very quickly". This is basically the AI foom argument.
And I think all of your other points are dependent on this one. Because if this is not true, then humanity will have time to iteratively deal with the problems that emerge, as we have in the past with all other technologies.
My reasoning for not expecting ultra-rapid takeoff speeds is that I don't view connectionist intelligence as having a sort of "secret sauce", that once it is found, can unlock all sorts of other things. I think it is the sort of thing that will increase in a plodding way over time, depending on scaling and other similar inputs that cannot be increased immediately.
In the absence of some sort of "secret sauce", which seems necessary for sharp left turns and other such scenarios, I view AI capabilities growth as likely to follow the same trends as other historical growth trends. In the case of a hypothetical AI at a human intelligence level, it would face constraints on its resources allowing it to improve, such as bandwidth, capital, skills, private knowledge, energy, space, robotic manipulation capabilities, material inputs, cooling requirements, legal and regulatory barriers, social acceptance, cybersecurity concerns, competition with humans and other AIs, and of course value maintenance concerns (i.e. it would have its own alignment problem to solve).
I guess if you are also taking those constraints into consideration, then it is really just a probabilistic feeling about how much those constraints will slow down AI growth. To me, those constraints each seem massive, and getting around all of them within hours or days would be nearly impossible, no matter how intelligent the AI was.
As a result, rather than indefinite and immediate exponential growth, I expect real-world AI growth to follow a series of sigmoidal curves, each eventually plateauing before different types of growth curves take over to increase capabilities based on different input resources (with all of this overlapping)."
Thinking in terms of threat models is smart. I do think there are plausible threat models. I'm agonizing a bit on how to articulate this and intend to turn my answer into a future AGI Friday. One thing to reemphasize in the meantime is that a threat model doesn't have to be particularly likely for it to be scary. I'm wary of arguments of the form "I can't imagine what harm XYZ could possibly do", even when it is indeed very hard to imagine such harms. In everyday life, it can be reasonable but if the stakes are high enough...
Counterpoints and reactions:
1. Is the human brain not a connectionist architecture? It's very different from artificial neural networks; I'm just not certain if the differences are fundamental or not. (Again, probably they are, but I'm not certain.)
2. I don't think all (maybe not even most) disaster scenarios depend on AI going foom. We can be just as doomed by getting frog-boiled. (And this sounds sci-fi-ish so I'll put it in parentheses but a misaligned AI could reach a point of intelligence and situational awareness that it appreciates the need to hide its abilities and bide its time for as long as it takes to ensure that it, not humans, control the long-term future.) The broader point is how multifarious the disaster scenarios are when we're talking about AGI.
3. I see what you mean about no secret sauce and plodding along gaining capabilities the hard way. But maybe recursive self-improvement accelerates that and compresses 100 years of tech progress into 3. And when the whole process is automated like that is when it can spin out of our control.
4. My second-favorite blogger, Dynomight, has a nice post on the "limits of smart" -- https://dynomight.net/smart/ -- that I think articulates some of your points nicely. But I don't think even he would say that these arguments lower p(doom) enough to breathe easy.
All that said, your predictions sound sensible. What's the nearest-term prediction implied by your overall take on this? If you're right, I will be very grateful to dial down my fears. And if you're wrong, or even if you downgrade your sanguinity, it'll be crucial to get a broader consensus that the danger is real.
(That's my biggest goal with AGI Friday at this point: to encourage enough epistemic humility that all of us can update our predictions as new evidence comes in. You're already shifting my probabilities, at least a tiny bit! And if nothing in this comment sways you at all, I'll update again.)
Thanks for the great comment. I'm really enjoying this conversation.
1) Yes, I think the human brain is mostly a connectionist architecture. But animals including humans have been trained by billions of years of evolution to want to have a survival instinct, prioritize things that lead to the survival of our own DNA, etc. It's totally different than the training environment of AI models. And I don't see why anyone would train a frontier AI model to have those dangerous kinds of drives.
2) Yes I agree there are certainly many ways for human extinction without AI foom. But it seems to me that the threat models which rely on us taking a lot of decisive action ahead of time, and trying to guess at threats before they materialize even in partial form ("no warning shots"), make much less sense in a non-foom world. Related to this, I think many AI safety/alignment efforts are really good -- in fact I think they are likely to be highly economically useful in the future, as they have been in the past (something like "today's safety is a big part of tomorrow's capabilities") -- but there are some ideas (like the West/the US attempting a pause on AI capabilities progress) that I think would cause harm if implemented. There's the saying - premature optimization is the root of all evil.
3) Somewhat agreed. I still don't think it's super likely that things would "spin out of control" even if 100 years of AI capabilities progress occurs in 3. Humans, and human-AI combinations, are pretty smart. I know some very smart people. I just have the feeling that we could figure things out -- or at least some humans would, I'm not claiming that some humans wouldn't use this opportunity to seize control, but that's different from AI extinction risk.
4) Agreed.
Here's a near-term prediction: At some point in the future, perhaps the early 2030s, we will get AI models that can create videos on the scale of a few minutes that are highly detailed with coherence from moment to moment. And all humans will not be killed within 1 year. I don't think this is a strawman, c.f. Eliezer's prediction here: "I could be wrong, but my guess is that an AI generated video of this kind and quality, indistinguishable from reality by close human examination without tools, will not appear more than 1 year before the end of the world.
ADDED: As produced by more like a one-sentence query than by a long detailed prompt with lots of iterative prompt engineering." (https://www.facebook.com/yudkowsky/posts/pfbid036HiCuDHG61wZYr5TGDGnH7tMpJCmmEbgq95UqF5qfcQuMRkYERBMH8GpwcB1a4PNl)
Here's another near-term prediction: Before Jan 1 2032, there will not exist, anywhere in the world, a robotic system/a robot that can perform a perfusion of the postmortem human brain with the same competence as a trained human (such as myself). I predict this with 90% probability. (And I think that without being able to develop this kind of general robotic capability, there are still a lot of barriers to AIs being generally smart enough to kill everyone.) My main claim here is that we have time to monitor the situation.
I'm not suggesting that you downgrade your fears. I'm not sure what your fears should be. I'm also not claiming to be particularly sanguine for the longer term. I don't know how things will play out.
Let me give you some more context about me: I work on brain preservation, which is about trying to preserve the structure of people's brains when they legally die in order to try to revive them in the future. I've been working in this field full time for just over two years now, at Oregon Brain Preservation. I feel like we've made some progress over the past few years. We also have a free option available for people who live/legally die in Oregon, which as far as I know is the first in the world to offer a free brain preservation option to the general public.
I honestly thought that people in the rationalist community would be more interested in this than they are, given that Eliezer and Robin were two of the main reasons I got interested in this field in the first place. And yet, the rationalist/rationalist adjacent community hasn't really seemed to care. And one of the main reasons people in this community explicitly state for not caring is because they are so sure that humanity is going to become extinct anyway.
It's not like brain preservation is going to increase x risk. If anything I think it might decrease it, if it makes people race less to try to get to AI in unsafe ways to try to save their own lives.
I guess my point is: I wish that there would be slightly more focus on helping people who wish to do so not die permanently, and perhaps otherwise taking advantage of the upsides of advanced AI capabilities that will likely develop over the next 50-100 years, rather than nearly exclusively focusing on the downsides. Especially those downside risks expected to develop in the next few years, which I don't consider particularly likely.
Anyway, that's putting my cards on the table/explaining my biases. =)
PS: Do you have a source on that Yudkowsky prediction? Google and Claude and o3 and I are coming up blank.
PPS: Oh, sorry, you have the link right there. Facebook! [shakes fist]
Here's a version of the video on YouTube; it's pretty fun:
https://www.youtube.com/watch?v=Ss-P4qLLUyk
And Yudkowsky's prediction again is that the ability to generate a video like that (indistinguishable from reality even on close human inspection) will require something within 1 year of AGI.
> And Yudkowsky's prediction again is that the ability to generate a video like that (indistinguishable from reality even on close human inspection) will require something within 1 year of AGI.
Well, here is Eliezer's prediction: "I could be wrong, but my guess is that an AI generated video of this kind and quality, indistinguishable from reality by close human examination without tools, will not appear more than 1 year before the end of the world.
ADDED: As produced by more like a one-sentence query than by a long detailed prompt with lots of iterative prompt engineering."
It's not about AGI per se. It's that humanity is extinct within 1 year of this type of video generation being possible. My prediction is the opposite, i.e. that it will be possible to create this type of video generation at some point, and that 1 year later, humanity will not be extinct. Granted this prediction runs into problems with anthropics.
> Thanks for the great comment. I'm really enjoying this conversation.
Same! And holy cow, that's amazing that you're part of Oregon Brain Preservation! You are doing the lord's work. There's a huge irony in that there's only one person I can think of who seems to think p(doom) is so high and so nigh as to make brain preservation work a poor return on investment. Namely, Eliezer Yudkowsky.
Anyway, back to our highly fruitful debate:
> [Human evolution is] totally different than the training environment of AI models
I agree, for now. But consider how AlphaGo was trained via self-play (synthetic data) to outmaneuver humans at Go. As AI improves, how sure are we that it can't do the equivalent for the vastly bigger gameboard that is the physical world?
> [There will be] warning shots / we could figure things out
Here's a non-foomy / frog-boily disaster scenario to try out on you: More and more capable AI gradually runs more and more of the world economy. We retain the ability to shut it all down but doing so becomes steadily harder and with greater and greater pressure to rationalize not doing so. At some ill-defined point, it becomes literally impossible. We pass a point of no return where doom is locked in. It may take many more years or decades for the planet to become uninhabitable or for humans to otherwise be crowded out. Maybe progress on physical-world-transforming tech isn't markedly faster than now and this slow-motion doom actually takes centuries. Seems unlikely, but maybe.
Point being, it's not just about ways AI could literally kill all humans, or how it could do that quickly or sneakily enough that we couldn't fight back. It's about losing control of the future when there's another entity on the planet with subtly human-incompatible goals that's more capable and more strategic in achieving those goals.
On to your predictions!
1. Ultra-realistic video generation (with an excellent physics model) is coming in the early 2030s and does not portend doom.
I don't really disagree on this, though my confidence is low.
2. We will not have robots that can perform a perfusion of a postmortem human brain by the end of 2031 (90% confidence).
I'm lower confidence than 90% but am inclined to take your word for this one. Can you generalize that to a class of tasks, or just something a bit less recondite? I think currently even berry-picking is just barely doable for robots?
But even in the world where many physical tasks continue to require physical humans, I foresee a lot of risk. Just as a thought experiment, imagine a malevolent superintelligence that's strictly virtual -- no robot armies, just mastery of the internet. It can amass unlimited wealth, pay humans to do any labor it needs, and steadily advance the field of robotics until it can safely kill us off.
Again, that's just one thought experiment. I don't think Terminator scenarios are likely. But I do think if you add up all the ways (including some we can't currently even fathom) that humans could lose control of AI, it puts the risk of disaster plausibly well above 10%.
Assuming AGI happens soon. If it's decades away (also totally possible!) then the field of AI safety/alignment may have time to catch up, we'll see.
In any case, let's keep trying to come up with nearer-term disagreements that get to the core of the disagreement about how worried to be, or whether to advocate for things like pauses or limits on training compute.
Probably a key thing we do agree on (correct me if I'm wrong) is that we want the idea of extinction risk from AI within the Overton window. If we do see a warning shot or other evidence of disaster on the horizon, we need to be ready with the political will to pull the plug in some way.