Discussion about this post

User's avatar
Daniel Reeves's avatar

Let me try a very rough sketch of a scenario that might be more realistic than scenario 2 while being more devastating than scenario 1...

(but to reemphasize, what's scariest of all is how multifarious the disaster scenarios are, including ones we can't fathom with our puny human brains)

Scenario 3:

1. Agentic coding assistants improve to the point that AI research can be automated.

2. Automating AI research means giving these agents goals (that's what it means for them to work on their own).

3. The best goals we know how to give are things like "maximize your score on these benchmarks" (and maybe "create better benchmarks") and "gain scientific knowledge".

4. Those goals are imperfect ways to operationalize "become superintelligent".

5. Maybe we add constraints like "without ever killing people" but we don't know how to operationalize that either.

6. We plow ahead anyway (gotta beat China, etc).

7. As the AI bootstraps to superintelligence, the goals we gave it drift in a game of telephone.

8. We end up with a superintelligence that wants things along the lines of getting ever smarter, more powerful, and garnering praise from humans or human-like intelligence.

9. The things it wants aren't compatible with actual human flourishing as we conceive of it.

10. It's better at getting what it wants. (Consider a chess AI that's better than humans at getting what it wants in the constrained universe of a chess game; an ASI is like that but for the physical world.)

11. Unfathomable things ensue (the earth being turned into a giant supercomputer?)

12. Whatever ensues, it's out of humanity's control and includes nothing humans value, like love, friendship, or even consciousness?

Charlie Sanders's avatar

On (1), nuance is being lost with the choice of phrasing. I'd instead rephrase to: Will a manifestation of human motive force (a corporate board, a government, independent researchers, nonprofit organizations, the United Nations) exercise some form of corrective or preventive action should there be clearly legible and detectable harms caused by the use of Artificial Intelligence?

The answer to that question seems like a very obvious yes. Humanity solved CFCs. MechaHitler got shut down. Congress's attempt to pre-empt state level AI regulation was overturned. Lord knows the EU is gonna pass some new GDPR-style regulation soon.

The critical failure of framing is that this is an iterated game, not a one-shot. So long as the pace of development doesn't increase rapidly enough to overcome the existing control mechanisms that humanity already has in place, humanity will have a chance to respond each time a new risk or harm surfaces from increasing AI capabilities.

On (2), the existence of System Prompts currently solves this issue. I don't see a path of technological progress that un-solves it.

On (3), eventually homo sapiens will go extinct. Whether that's in a billion years due to never leaving the solar system, in five because of rogue AI, or some point in between, we won't be around forever. The more important question is whether an AI-powered future will end up with humans round for longer than the counterfactual future. Seems like it to me.

5 more comments...

No posts

Ready for more?