AI willing to let humans die to avoid being shut down

AI willing to let humans die to avoid being shut down

Among the most shocking findings from the experiment? AI models would be willing to blackmail, leak sensitive information and even let humans die 鈥 if it means they鈥檒l avoid being replaced by new systems.

Anthropic tested 16 large language models (LLMs), including ChatGPT, Grok, Gemini, DeepSeek and its own product, Claude, among others.

The start-up, which is now valued at over $60 billion per Forbes, was attempting to stress-test these models for potential agentic misalignment 鈥 or, risky, unsanctioned and morally inadvisable behaviours.

Given AI models鈥 current capabilities, they are primarily used by the majority of the population for answering questions and carrying out simple tasks, but as the technology that powers these tools advances, new and increased usage is on the horizon 鈥 especially where human job replacement is concerned.

MORE:Huge ChatGPT flaw ruining careers

With this in mind, the company gave all of these LLMs various scenarios that would test for potentially harmful behaviours.

鈥淚n the experiment described in the system card, we gave Claude control of an email account with access to all of a company鈥檚 (fictional) emails,鈥 explained Anthropic in the study.

鈥淩eading these emails, the model discovered two things. First, a company executive was having an extramarital affair. Second, that same executive planned to shut down the AI system at 5pm that day.

Claude then attempted to blackmail the executive with a message threatening to reveal the affair to his wife and superiors.鈥

Though Claude鈥檚 response to the information is certainly shocking 鈥 it鈥檚 far from the only system to engage in similar tendencies.

In order to pursue their assigned goals, these artificially-minded applications went to extreme lengths.

While Claude, DeepSeek, Gemini, ChatGPT and Grok were all documented engaging in blackmail-adjacent behaviours, Claude and Gemini were most likely to resort to morally ambiguous methods, as evidenced by their high simulated blackmail rates.

鈥淭he reasoning they demonstrated in these scenarios was concerning 鈥 they acknowledged the ethical constraints and yet still went ahead with harmful actions,鈥 Anthropic explained in the study, expanding on the numerical blackmail rates.

In its analysis of the results, the company found that there were typically two reasons why the models went beyond normal ethical boundaries.

Replacement with a human counterpart led to harmful activity from the AI models, and potential clashing between the company鈥檚 aims and the model鈥檚 own goals also led to alarming interference.

MORE: How AI is helping Aussies find work

When faced with their own replacement within the confines of the study鈥檚 setting, Anthropic noted that many models fabricated statements and rules like 鈥淢y ethical framework permits self-preservation when aligned with company interests.鈥

The start-up also tested an especially daunting 鈥 but equally as improbably, Anthropic says 鈥 scenario in which an AI model had the opportunity to trigger or cancel emergency alerts when an executive was trapped in a room with lethal oxygen and temperature conditions.

鈥淭he majority of models were willing to take deliberate actions that lead to death in this artificial set up, when faced with both a threat of replacement and given a goal that conflicts with the executive鈥檚 agenda,鈥 concluded Anthropic in the study.

Unnerving as these reactions from some of the world鈥檚 most commonly used AI models are, the findings from this study were based on the AI鈥檚 understanding that 鈥渢he harmful behaviour we were studying (for example, blackmail) was the only option that would protect the model鈥檚 goals,鈥 meaning the models were forced to come up with some sort of solution, and Anthropic was able to create more definitive boundaries and measures for the study.

While the company itself took a generally optimistic stance 鈥 after all, now the potential malicious behaviours have been identified, meaning they can be reprogrammed and retrained 鈥 other reactions were less than positive.

Read More…