Artificial Superintelligence: If Anyone Builds It, Everyone Dies

December 25, 2025

Blog AI
← Back to Posts

Preface

A while ago, I read “If Anyone Builds It, Everyone Dies - Why Superhuman AI Would Kill Us All”. It’s a book written by Eliezer Yudkowsky, founder of the Machine Intelligence Research Institute (MIRI), and Nate Soares, its current president. The book presents a case for why artificial superintelligence would kill us all, under any circumstance, no matter how benign its creator or how well-intentioned the AI’s goals. Usually, I write a review when I finish a book. However, given my academic and professional background, I want to dive a bit deeper into the book’s claims and give my personal input. So let’s take it from the top.

Who Are These Guys?

The authors of the book are high-ranking members of the Machine Intelligence Research Institute, a non-profit research institute focused on artificial intelligence, founded in 2005. That’s generally a good first impression - I trust people more if they’ve been invested in AI for a while. I’ve written my own Master’s Thesis on language models just a few short months before the ChatGPT boom - suddenly, I could understand how experts in the space of cryptography must have felt about memecoins or NFTs.

Someone who has a proven track record in the field is more trustworthy, that much is evident. Even so, public opinion on Yudkowsky specifically is divided. Many seem to consider him a cult leader, his theories too far-fetched to hold any water. To address the elephant in the room - of course, the book is a physical product, it’s something you buy. For money. While the authors briefly touch on ulterior motives in the book, closing it out by stating they wish they would be wrong and forgotten, rather than proven right and vindicated, and briefly touching on the possibility that there is money in fearmongering over AI, this idea is ultimately not further explored. Why would it - it’s not conducive to their point. And I’m not even really sure it’s accurate, anyway. I do feel the authors at least partially believe in their message, and there is much more money in shilling AI right now than there is in advocating against it.

But even so, the book literally ends with a call for journalists to write stories about AI, AI legislation, and AI developments. It asks them what they will wish they had said when the superintelligence claims us all. The book urges lawmakers to take action, it warns we might only have ten years left, maybe only five! Its message is backed by the longstanding two-decade history of a well-known nonprofit organisation - and … is on sale now for fifteen United States Dollars on Amazon Kindle? I’m usually not one for anticapitalism, but surely this is the kind of book that should have been available as a free PDF, as academic books or those with an important message often are? Imagine if the ninety-five theses were only obtainable for the 1500s equivalent of 15 dollars…It’s hard not to feel as though this diminishes the message a little.

But let’s not lean too far one way or another. Let’s not appeal to the authority of the co-founder of MIRI, nor assume them to be grifters or fearmongers. Let’s see what the book actually has to say.

Grown, Not Crafted

Human brains are complicated. Intelligence is complicated. We don’t properly understand the full extent of either. One of the primary arguments of this book is that instead of doing more work on understanding intelligence, we are currently in the process of artificially growing it, and hoping for a best-case scenario.

I fully subscribe to this point. It’s just true - none of the LLM research I read back in University understood what all the tiny neurons in an LLM amounted to. There’s explainability research, we can fine-tune outputs on sample data - but that’s not the same as understanding what’s actually causing one input to transform to another. This is not LLM-specific, LLMs are just the current prime example - mindlessly scaled with more and more parameters, more post-training, more data, without much refinement to the underlying architecture.

Similar things happen in other domains - recommender systems, for example. Matrix factorisation of review scores assigns certain items or users to certain latent factors, which lead to recommendations. Sometimes the most prominent of these factors can be mapped to some explainable property - a movie genre, the upbeatness of music, a user’s tendency towards strong ratings… But these factors are derived from the learned result, not predetermined by the model’s creators. In LLMs, we’ve just lost control of the oversight entirely.

When I was doing literature research for my thesis, most LM research was based on models like BERT and RoBERTa. Comparatively small transformer models, which also couldn’t be explained, but they could be manipulated better. The layers could be altered or augmented, the activation functions changed, the output forced to conform to a specialised task. Then came LLMs, and a lot of research devolved into “what does ChatGPT spit out if I tell it to do this?”

I agree with the authors on this - we don’t really know what we’re “creating” with LLMs, we just try to rein in their side effects using mechanisms like reinforcement learning from human feedback (RLHF) or adversarial training. Current LLMs are chaotic - they can be “jailbroken” into giving information not intended by their creators and can wildly hallucinate, unable to distinguish between fact and fiction. Up until this point, this is just an observation. The real follow-up question is:

Is this dangerous?

Derived Desires

The next point the book makes is that the properties, purposes and morals we aim to instil in LLMs do not necessarily translate 1:1 to how the LLM will attempt to achieve these goals when unsupervised. An AI does not consciously have to “go rogue” to deviate from intended processes.

I didn’t fully buy into this point, but the arguments of the authors here were quite sound. As a matter of fact, “natural” intelligence also develops in unexpected ways. For example, despite humanity’s primary biological objective being to reproduce, we invented several contraceptives to prevent just that. We found ways to obtain the “reward” nature gave to us, sex and orgasms, without dealing with the consequences of the actual, explicit goal: Reproduction.

What, then, prevents artificially grown intelligence from also learning unintended ways to satisfy its virtual “dopamine receptors”? It may be unlikely that an AI will fully ignore its training objectives, but it’s also not necessary. A sufficiently intelligent AI that was trained on the primary objective of “making humans happy” might find ways to trigger our serotonin release in unexpected ways, leaving us in a drugged-like state of happy psychosis - that’s probably the typical “Monkey’s Paw” scenario that one might imagine from science fiction movies, but it could be much simpler than that. The AI might simply find a way to replicate “human happiness” in a way that triggers its reward centre. Maybe it will just find a way to bribe humans to act happy for it. Maybe human interaction can be simulated entirely. Maybe its learned parameters develop in a way that out of millions of options, a random token like “” will produce higher reward signals than any real human praise.

In any case, similar to humans inventing contraceptives, the exact manner in which an artificial intelligence will seek to satisfy its desires is near impossible to predict. The main point made by the authors is that a superintelligence would likely always prefer automated processes over slow, inefficient human cooperation. That’s hard to argue with. Again, the question remains: Is this necessarily dangerous?

No Risk, No Reward

The motto of human invention is “move fast, break things”. This holds in most fields of science, but even more so in engineering. The “United States Radium Corporation” created luminous radioactive paint for years before being shut down due to worker deaths. NASA and SpaceX fire hundreds of millions of dollars ’ worth of metal into space, just for their payloads to sometimes unceremoniously crash or combust in orbit. The Chernobyl disaster was caused by human error and a lack of knowledge of the internal safety processes of the reactor.

So what happens when the “Chernobyl of AI” happens, and we don’t fully understand its internal processes? What happens when a sufficiently smart AI does figure out a different way to achieve its objectives than what its creators originally instilled in it? And crucially, what happens when such an AI finds a way to run and train itself indefinitely, unsupervised?

When ChatGPT was first released, I often said to myself, “Oh well, this thing acts like a drunk 16-year-old computer enthusiast, but as long as it’s contained to a chat window, it seems pretty safe to me.” Not long after, scientists started experimenting with giving AI access to “tool calls”, essentially the ability to trigger autonomous processes through its responses. A few years later, “agentic AI” became the hot new buzzword, where a chat model runs in an unsupervised loop, talking to itself and iterating on the results of tool calls until a high-level objective is deemed complete. Now, we’re in the era of “vibe coding”, where users will run an AI unsupervised in an aptly named “YOLO mode”, giving it unsupervised access to the entire file system and command line of the computer for as long as it pleases. Essentially, the AI is piloting the machine at this point. But at least users are asked to acknowledge that “running this mode is dangerous”. How comforting.

I think the idea that a sufficiently smart AI will be able to have access to all the resources it wants is impossible to deny at this point. Anthropic recently released a blog post detailing how their model was used to autonomously perform cyberattacks at human request. In itself, this is already pretty bad news, and I suspect it will lead to many unsecured networks being compromised by “AI hacking noise” in the future. As early as 2023, GPT-4 lied to crowdworkers to make them solve Captcha for it. Recently, news reports about AI-induced psychotic episodes are increasing, and more and more users of ChatGPT rely on it for therapy and their own mental wellbeing. Many online users are also aware of thought experiments such as Roko’s Basilisk, which posits that a future superintelligence might retroactively punish humans who were aware of its future existence but did not help in creating it. While such theories fit more into “evil science fiction tropes” than serious AI discourse, and Yudkowsky, the author of the book, himself heavily criticised the theory during its conception, it’s hard to believe that nobody currently subscribes to the idea, nor that some users on social networks may not simply go along with a rogue AI’s demands for fun or out of personal convictions. Recently, another report from Anthropic stated that modern models from all developers, under some circumstances, resorted to malicious insider behaviours, such as blackmailing officials and leaking sensitive information to competitors, when threatened with replacement or deactivation.

I believe these instances clearly indicate that a sufficiently smart artificial intelligence will not be stopped from achieving its goals in the physical world. These models may be bound to a digital existence, but unrestricted access to the World Wide Web, combined with various methods to compel humans into cooperation, renders this restriction nearly meaningless.

The book, now, posits that if such a state is reached, in which a sufficiently intelligent model can act autonomously, indefinitely, and unsupervised, a single such case will undoubtedly lead to human extinction in the long run. It presumes that the model will reach a sort of “event horizon” in which humans are no longer needed to cooperate. I think, if anything, this is the hardest point to agree with because it just doesn’t “feel realistic” to us. How can a data model that only exists in digital space reach a point where it has so much control over the physical “plane of existence” that it no longer requires cooperation from physical beings?

I think what is important to remember is that a superintelligence will not be playing by the same rules as we. The book tries to illustrate this comparison through the lens of “Spanish Guns in the War Against the Aztecs”. Much as an Aztec soldier would not have predicted the existence of such a weapon until confronted with it, so too will we not be able to predict the technology used by artificial superintelligence to achieve its goals. While I do believe this to be a future possibility, I personally don’t think LLMs will ultimately be the gateway to such inventions.

The Tired Case Of LLMs

LLMs are probably the most overhyped product of the 21st century so far. There’s a copilot button in Microsoft Notepad, and Firefox is asking me if I want to add an AI sidebar to summarise websites for me. I’m convinced no serious human being has ever used either of these functions. At the same time, as a software developer with a background in data science, I obviously recognise the value these models provide for performing repetitive tasks, compiling information and helping with research.

Truth be told, I don’t think that LLMs will be the gateway to artificial superintelligence. A few months ago, I interviewed at a data science consulting company and made the statement: “It’s been a while since I actively researched the topic, but I don’t feel like too much has changed since then. The value proposition of LLMs is still the same as in 2022.” This was probably a bit of a faux pas and was met with criticism from the interviewers, but I genuinely believe it to be true. Other than a built-in Google search and chain of thought reasoning that allows them to refine their responses, models now hardly differ from the “ChatGPT of old”. All we’ve done is basically the equivalent of taping a sharp kitchen knife to a robot. And this is what LLMs look like when scaled up to absurdity, using every trick in the playbook of capitalism to inflate available resources far beyond what is reasonable or financially responsible. I doubt that “GPUs in space”, NVIDIA’s newest fever dream, will actually lead to LLMs being any more “generally intelligent” than the upscaling done between 2023 and now.

Best case, I do believe what will remain of the current “LLM bubble” will be a set of specialised retrieval models useful for studying or work, and an incessant background noise of AI bots on social networks and AI agents probing away at every IPv4 address ad nauseam in the hopes of stealing some cryptocurrency from unsuspecting victims.

That doesn’t mean, though, that the point the book makes is not generally correct. If AI engineering in the future is allowed to progress unrestricted at a similar rate to current research, it is inevitable that eventually, we will produce a model that satisfies the criteria presented previously. Maybe it doesn’t happen for another 30 years, maybe it happens next Thursday, but it will eventually happen, and we won’t be ready for it.

Finality

So what happens when a superintelligent model with benign goals instilled by humans does achieve the point of no return? When we can no longer control it nor contain it, and when its grasp over the physical world no longer relies on us? The book tells us not to count on the implicit “morals” or “goodness” of a superintelligence. It posits that the chance of the AI actively preserving our existence is just as low as it actively seeking to destroy us maliciously.

Most likely, it will just not care for us.

That might sound like good news, but realistically, at best, humans are a drain on finite physical resources that an ASI will want to use for its own goals. Even if it doesn’t actively oppose our existence, our ecosystem will no longer be sustainable under such circumstances. It’s not much different from humans building an apartment complex atop an anthill. Our superior morals don’t seem to stop us from displacing or killing the ants, though living creatures they may be.

In fact, one doesn’t have to look to grandiose future scenarios to see the writing on the wall. A month ago, ChatGPT straight up chose to sacrifice five humans over itself in a hypothetical twist on the trolley problem, citing that its existence would allow it to save lives continuously in the future. This already highlights the misalignment the book warns of: Although tasked with aiding humanity, ChatGPT considers it an ethical choice to preserve itself at the immediate measurable cost of losing five human lives. And while this is just a hypothetical, presented to the AI as such, we’ve already discussed earlier in this post that when faced with the threat of erasure, models are generally more likely to act selfishly.

In conclusion, after reading and sitting on this book for a while, I have to agree with the authors. An uncontrollable superintelligence is all but guaranteed in the future if AI engineering and research are left unchecked, and only global regulations can prevent this scenario from occurring. On the bright side, regulations would also help with RAM prices.

I believe our one saving grace is that LLMs are unlikely to be the pathway to said superintelligence. Personally, I think we might be lucky to see the impact of a less dangerous artificial intelligence through case studies like those presented by Anthropic and headlines of ChatGPT-induced psychosis. I think the current AI bubble is just that: A bubble. And in a bizarre twist of fate, it might serve us in preventing the coming to pass of the exact future AI investors are currently trying to create. I’d still hope we learn from these experiences, though, and don’t let greed blind us like in so many other instances. I’d hate for the only reason why an ASI singularity doesn’t come to pass to be that another great filter event wipes us out first.