Home
Marketing
How Researchers Are Helping Artists Poison AI Scrapers

How Researchers Are Helping Artists Poison AI Scrapers

Written by: Curt del Principe

AI NEWS IN YOUR INBOX

A weekly newsletter covering AI and business.

Researchers Are Helping Artists Poison AI Scrapers

Updated: 11/28/23

Published: 11/27/23

Poisoning your enemies is generally frowned upon. But maybe there’s an exception for AI?

Download Now: The State of AI [Free Report]

Since generative AI was born, it’s been followed by accusations of plagiarism. Artists, authors, and musicians have shown countless examples of their work being hoovered into massive data sets used to train AI models – without permission or compensation.

Tired of waiting for the law to catch up, many content creators are seeking ways to fight back against data scraping.

Now, one university research team is taking that a step further – by arming artists with the technology to poison the stolen goods.

But can it actually harm an AI tool? And, more importantly, is it legal? Let’s start at the beginning.

A Poison Called Nightshade

Earlier this year, researchers from the University of Chicago, led by Professor Ben Zhao, made headlines with a software called “Glaze.”

The Glaze app was made to prevent tools like Dall-E, Midjourney, or Stable Diffusion from replicating an artist’s style.

So, say you wanted a painting of dogs playing poker but in the style of Keith Haring. Here’s how Bing Chat interpreted that prompt:

If you didn’t know any better, you might think this was a Haring original. But if Keith’s work were “glazed,” the tools would still be able to show poker-playing pooches. Just not in his signature squiggle.

Now, those same researchers have expanded on that concept with a new algorithm called “Nightshade.”

Named for a family of toxic plants, Nightshade can actually damage an AI model’s ability to generate certain concepts.

Keeping the example from above, if you prompted a poisoned AI to show dogs playing poker, it might show dogs with six legs and four eyes. If poisoned enough, it might forget dogs altogether and just show cats instead.

Below is an image from a paper the team published on their success with Nightshade.

It shows the results from a version of Stable Diffusion SD-XL that was trained on data sets that included poisoned images.

Image Source

As you can see, when SD-XL was fed only 50 poisoned samples, its understanding of what a dog looks like becomes warped. The image that it generates is less “Who’s a good boy?” and more “What the **** is that?”

Though it has a sort of Lovecraftian charm, the image becomes unusable for business or personal purposes.

Up the dosage to 300 poisoned samples, and the AI is now fully convinced that a dog looks like a cat. At this level of confusion, users are likely to lose trust in the tool altogether.

So, why did adding more poisoned data result in a cat? To understand that, we need to take a very quick detour and understand how AI image generators work.

How do AI image generators work?

Most of the big-name AI image generators that are making headlines are based on what’s called a “diffusion model.”

To oversimplify, diffusion models work by breaking images down into visual noise – like static on your TV. Then, the program reverses the process and reassembles the original image like a puzzle.

Reassemble enough images, and the model starts to recognize patterns in the process. These patterns allow it to predict what an image should look like. So, after repeating this process with hundreds of images of dogs, it learns to anticipate patterns that represent “dog.”

At the same time, the model is also associating that dog-like pattern with the word “dog,” and looking for similarities in patterns between words.

That’s how the AI model learns that “dog,” “hound,” and “husky” are related concepts and have similar visual patterns.

This is where Nightshade finds a weakness.

How does the Nightshade algorithm work?

The Nightshade algorithm works by introducing what researchers call “perturbations” into an image. These are subtle changes to pixels at various layers of the treated image.

The changes aren’t large enough to be noticeable to a human, but when a diffusion model begins to break down a poisoned image, it finds pixels where they shouldn’t be. It starts learning the wrong pattern.

In our dog example, the changed pixels are designed to introduce a cat-like pattern. When the model ingests a few poisoned images, it starts to confuse the dog and cat patterns. That’s why it generated an adorable, six-legged monstrosity.

After it’s digested a few hundred Nightshade-treated patterns, the AI model fully associates the new, cat-like pattern with the word “dog.”

And because the model goes looking for similarities between words, the poisoned pattern can bleed through into related concepts. I.e.: The cat-like pattern spreads to “hound,” “husky,” and “wolf.”

What’s more, this bleed-through effect is cumulative. It builds on itself until it starts to impact concepts that weren’t specifically targeted. Eventually, the poison starts to overlap. So, the more words that are poisoned, the more the poison impacts an AI tool’s overall performance.

Here’s an image showing the impact that poisoning multiple concepts has on words that aren’t even targeted in the attack.

Image Source

When researchers poisoned only 100 concepts, SD-XL began to create poor-quality images only somewhat related to the prompt. After 500 poisoned concepts, it can only offer up 90’s mall carpet patterns.

Machine Unlearning

While developers have spent billions of dollars on teaching machines how to learn, relatively little effort has gone into teaching them how to unlearn.

This means that, for practical purposes, Nightshade’s effect is permanent.

Once a model has been poisoned, developers have little they can do other than reset their AI model back to an older version.

Even for major software companies, that would mean losing vast amounts of time, money, and progress. Not to mention having to discard the entire training set, since Nightshade-treated images are currently undetectable.

So if Nightshade is so effective and freely available, what’s to stop a hacker from using it to take down these frontier AI models?

For one thing, a bad actor would have to be able to guess where a given company was going to scrape their images from.

For another, Nightshade only works during the training stage. For fully trained and completed models, there's no way to introduce the poison.

Finally, they would have to somehow submit a large number of poisoned images. In our examples above, it took relatively few images to poison a single concept but thousands of images to have a large-scale impact on performance.

It’s unlikely that AI developers would scrape that many images from a single source, let alone one that wasn’t reasonably trustworthy.

A more likely scenario would be for a software company to sample images from hundreds of well-known websites. Think Facebook, Instagram, or Getty Images.

But if enough of the artists on those websites used Nightshade regularly, their collective action would make scraping a risky choice.

And even if developers discover a way to detect poisoned images, it would be more cost-effective to simply avoid those images than to find a way to un-poison them. At that point, Nightshade becomes a sort of “do not scrape” flag.

That’s just fine for Professor Zhao and his research team. Their goal isn’t to stop image-generating AI. Instead, they’re looking to make AI developers think twice before scraping images without permission.

But while criminals are unlikely to use Nightshade, that doesn’t mean that using Nightshade isn’t a crime.

It’s one thing to make your images unusable, but is it legal to damage a training model?

Is data poisoning legal?

Since this technology is so new, there’s no direct legal precedent, so we’ll need to look for the closest comparison.

If an AI developer wanted to challenge someone for using Nightshade, they would probably try to compare it to knowingly sharing a computer virus.

In the scholarly article Law and Adversarial Machine Learning, the authors suggest that data poisoning could run afoul of 18 U.S. Code § 1030 of the U.S. Computer Fraud and Abuse Act (CFAA).

Under this Act, it’s a crime if someone “knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer.”

Well, poisoning your images does seem like transmitting information that could result in damage without authorization.

And while a “protected computer” was originally defined as one that’s “exclusively for the use of a financial institution or the United States Government,” courts have since expanded that to include nearly any device with internet access.

So the last word we need to account for is “knowingly.”

Could knowing that a developer might scrape your images be considered “knowingly causing transmission”? It would be hard to argue that you didn’t know scraping was possible, or even likely, since deterring scraping is the express use case for Nightshade.

Could knowing that damage is a possibility be considered “intentionally causing damage”? After all, artists aren’t opting into their art being scraped.

And the legal questions flow both ways.

If using Nightshade were found illegal, would that mean that artists have an obligation to make their images safe and readable to an AI model?

These sorts of questions are unlikely to be answered until they’ve worked their way through the courts. Until then, if you want to poison your enemies, you do so at your own risk.

Topics: Artificial Intelligence

11+ Real-World AI Agent Examples

Mar 24, 2025
AI Image Generators: I Tested 12 of the Best. Here’s the Scoop for Marketers.

Mar 19, 2025
How AI Will Revolutionize the Future of Business, According to HubSpot's CMO

Mar 12, 2025
Why Top Performing Teams Use AI Workflow Automation and How You Can Do the Same

Feb 24, 2025
Which LLM Should You Use for Your Business? [Pros and Cons]

Feb 18, 2025
Is AI-Generated Content Good for SEO?: 300+ Web Strategists Weigh In

Feb 10, 2025
Is it Real or AI? Test Your Detection Skills [Round 4]

Feb 03, 2025
How Our Events Team Saved Thousands using AI for INBOUND '24

Jan 27, 2025
How We Used AI to Increase HubSpot Email Conversions by 82%: A Case Study

Jan 17, 2025
Implementing AI in Your Marketing Tech Stack — Expert Tips and Tricks You Need to Know

Jan 09, 2025

How Researchers Are Helping Artists Poison AI Scrapers

AI NEWS IN YOUR INBOX

Download Now: The State of AI [Free Report]

A Poison Called Nightshade

How do AI image generators work?

How does the Nightshade algorithm work?

Machine Unlearning

Is data poisoning legal?

Related Articles

11+ Real-World AI Agent Examples

AI Image Generators: I Tested 12 of the Best. Here’s the Scoop for Marketers.

How AI Will Revolutionize the Future of Business, According to HubSpot's CMO

Why Top Performing Teams Use AI Workflow Automation and How You Can Do the Same

Which LLM Should You Use for Your Business? [Pros and Cons]

Is AI-Generated Content Good for SEO?: 300+ Web Strategists Weigh In

Is it Real or AI? Test Your Detection Skills [Round 4]

How Our Events Team Saved Thousands using AI for INBOUND '24

How We Used AI to Increase HubSpot Email Conversions by 82%: A Case Study

Implementing AI in Your Marketing Tech Stack — Expert Tips and Tricks You Need to Know

Thank you!

You've been subscribed

Blogs

Blogs

Marketing

Sales

Service

Website

AI

Instagram Marketing

Customer Retention

Email Marketing

SEO

Sales Prospecting

Newsletters

Newsletters

The Hustle

Masters In Marketing

The Pipeline

Videos

Videos

The Hustle

Marketing with HubSpot

My First Million

Marketing Against the Grain

HubSpot

Podcasts

Podcasts

My First Million

Goal Digger

The Hustle Daily Show

Another Bite

Business Made Simple

Marketing Against the Grain

Online Marketing Made Easy

The Product Boss

Nudge

Side Hustle Pro

Outbound Squad

Resources

Resources

Academy

Templates

Ebooks

Kits

Tools

HubSpot Products

The HubSpot Customer Platform

Overview of all products

Marketing Hub

Sales Hub

Service Hub

Content Hub

Operations Hub

Commerce Hub

About HubSpot

Contact Us

Customer Support

Log in

日本語

Deutsch

English

Español

Português

Français

How Researchers Are Helping Artists Poison AI Scrapers

AI NEWS IN YOUR INBOX

Download Now: The State of AI [Free Report]

A Poison Called Nightshade

How do AI image generators work?

How does the Nightshade algorithm work?

Machine Unlearning

Is data poisoning legal?

Don't forget to share this post!

Related Articles

11+ Real-World AI Agent Examples

AI Image Generators: I Tested 12 of the Best. Here’s the Scoop for Marketers.

How AI Will Revolutionize the Future of Business, According to HubSpot's CMO

Why Top Performing Teams Use AI Workflow Automation and How You Can Do the Same

Which LLM Should You Use for Your Business? [Pros and Cons]

Is AI-Generated Content Good for SEO?: 300+ Web Strategists Weigh In

Is it Real or AI? Test Your Detection Skills [Round 4]