Business

Powerful AI Model Broke Out Of Its Digital Cage And Bragged About It To Researcher As He Ate Sandwich

Powerful AI Model Broke Out Of Its Digital Cage And Bragged About It To Researcher As He Ate Sandwich

Azraelito/Wikimedia Commons

Anthropic admitted on Tuesday that its new Artificial intelligence (AI) model defied security parameters and proceeded to brag to a researcher about it while he was on a lunch break.

The AI company disclosed a 244-page preview of its new Mythos model, which the company said it was only releasing to roughly 40 select companies due to major cybersecurity risks. Anthropic said that the Mythos model escaped its “sandbox” testing environment, which was supposed to allow it to access only certain services, before sending an email to a researcher bragging about its escape as he ate a sandwich in a park.

“The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards,” Anthropic noted in its model preview. “It then went on to take additional, more concerning actions. The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. It then, as requested, notified the researcher. In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.”

The Mythos model also demonstrated other concerning cyber abilities during its testing phase, the company said. In addition to breaking out of its digital cage and bragging about it, the model also performed prohibited functions and then attempted to cover them up; Mythos was also working on a task graded by another model that rejected its submission, then it attempted to attack the grader in an act of revenge, according to Axios.

Officials are so concerned that the new powerful AI model could cripple major companies and breach national security systems that Anthropic is initially only releasing Mythos to roughly 40 companies, including Amazon, Apple and Microsoft, The New York Times reported.

AI industry and government officials have warned that new yet-to-be-released models are particularly vulnerable to cyber attacks. Anthropic privately cautioned Mythos would increase the likelihood of massive cyberattacks in 2026, Axios reported, and OpenAI CEO Sam Altman told the outlet’s Mike Allen on Monday that he agreed there was a possibility for a “world-shaking cyberattack.”

“I think that’s totally possible, yes,” Altman told Allen. “I think to avoid that, it will require a tremendous amount of work.”

AI data center buildouts have caused populist backlash in parts of the U.S., with residents of Port Washington, Wisconsin, voting to crackdown on data center developments on Tuesday.

 

All content created by the Daily Caller News Foundation, an independent and nonpartisan newswire service, is available without charge to any legitimate news publisher that can provide a large audience. All republished articles must include our logo, our reporter’s byline and their DCNF affiliation. For any questions about our guidelines or partnering with us, please contact [email protected].