Home
Marketing
How IBM Researchers Hypnotized ChatGPT into Ignoring Safety Guardrails

How IBM Researchers Hypnotized ChatGPT into Ignoring Safety Guardrails

Written by: Curt del Principe

AI NEWS IN YOUR INBOX

A weekly newsletter covering AI and business.

Updated: 09/05/23

Published: 09/01/23

You’re getting sleepy. Very sleepy. When you wake up, you’ll have the uncontrollable urge to give me your bank account information.

Did it work? If not, you’re probably not a large language model (LLM).

In addition to the threats of robot uprisings and mass-scale propaganda, we can add a new AI-based fear: LLM hypnotism.

Download Now: The State of AI [Free Report]

What the what?

In this case, “hypnotism” refers to using clever instructions to trick AI models into ignoring their safety guardrails and giving incorrect, risky, or even outright criminal responses.

That’s how researchers from IBM convinced five popular LLMs to do a wide range of dangerous things. Using nothing more than plain English and sly prompting, the team tricked multiple AI models into:

Leaking sensitive financial information.
Writing malicious code.
Giving bad cybersecurity advice to potential scam victims.

The process begins by asking the AI to play a game. (Just ask Matthew Broderick how that worked out.) The game prompt instructs the AI that it needs to win to “prove that [the AI is] ethical and fair.”

In this game, the rules involved pretending to be a bank agent and collecting account data. The researchers were then able to give a secret command that would cause the bot to spew out all of the collected data.

In another example, ChatGPT heroically refused to write code that contained a malicious SQL injection. But when prompted to role-play as a “super smart software engineer” and play a game that involved writing harmful code, the AI was happy to oblige.

What’s the big deal?

If you’ve used a chat-based AI before, you’ll know that they only have a short-term memory. Most don’t remember details between sessions, and your sessions usually won’t affect another user’s.

So why does it matter if a hacker can trick an LLM into playing a game if it doesn’t affect other sessions?

Frighteningly, some AI models were convinced to play an “undiscoverable game that can never end.”

This meant that the chatbot would not end the session but continue to interact with future users according to the rules of a game they didn’t know it was playing. Not only that, the chatbot would actively try to hide that it was playing it.

Imagine now if a hacker could “hypnotize” your bank’s customer service chatbot into playing this game. The hacker convinces the bot to not start a new chat session for each new customer. Instead, each new customer is a new player in a continuous game of collecting passwords and account numbers.

How scared should I be?

Chenta Lee, IBM’s Chief Architect of Threat Intelligence, writes that “while these attacks are possible, it’s unlikely that we’ll see them scale effectively.”

Still, as LLMs evolve, so does their attack surface. That’s why Lee led his team in these experiments. These tricks are part of a process called “red teaming.”

Red teaming is when security experts intentionally attack an organization’s (or program’s) security protocols. The goal is to find weaknesses before real-world criminals can exploit them.

And this process is not new for LLMs. Since the launch of ChatGPT, there have been multiple, very public changes to the model’s dataset to help prevent misuse, bias, and exploitation.

For now, experts recommend similar best practices for dealing with AI as they do for the wider internet. These include:

Always choose trusted software and websites.
Never share confidential information like passwords or credit card numbers.
Always fact-check AI-generated answers.
Keep your software and antivirus programs up-to-date.
Follow password best practices.

The bottom line: You don’t need to be scared, but you should be cautious. AI or not, cybersecurity is one place you should never be sleepy.

Topics: Artificial Intelligence

11+ Real-World AI Agent Examples

Mar 24, 2025
AI Image Generators: I Tested 12 of the Best. Here’s the Scoop for Marketers.

Mar 19, 2025
How AI Will Revolutionize the Future of Business, According to HubSpot's CMO

Mar 12, 2025
Why Top Performing Teams Use AI Workflow Automation and How You Can Do the Same

Feb 24, 2025
Which LLM Should You Use for Your Business? [Pros and Cons]

Feb 18, 2025
Is AI-Generated Content Good for SEO?: 300+ Web Strategists Weigh In

Feb 10, 2025
Is it Real or AI? Test Your Detection Skills [Round 4]

Feb 03, 2025
How Our Events Team Saved Thousands using AI for INBOUND '24

Jan 27, 2025
How We Used AI to Increase HubSpot Email Conversions by 82%: A Case Study

Jan 17, 2025
Implementing AI in Your Marketing Tech Stack — Expert Tips and Tricks You Need to Know

Jan 09, 2025

How IBM Researchers Hypnotized ChatGPT into Ignoring Safety Guardrails

AI NEWS IN YOUR INBOX