Poisoning your enemies is generally frowned upon. But maybe there’s an exception for AI?
Since generative AI was born, it’s been followed by accusations of plagiarism. Artists, authors, and musicians have shown countless examples of their work being hoovered into massive data sets used to train AI models – without permission or compensation.
Tired of waiting for the law to catch up, many content creators are seeking ways to fight back against data scraping.
Now, one university research team is taking that a step further – by arming artists with the technology to poison the stolen goods.
But can it actually harm an AI tool? And, more importantly, is it legal? Let’s start at the beginning.
A Poison Called Nightshade
Earlier this year, researchers from the University of Chicago, led by Professor Ben Zhao, made headlines with a software called “Glaze.”
The Glaze app was made to prevent tools like Dall-E, Midjourney, or Stable Diffusion from replicating an artist’s style.
So, say you wanted a painting of dogs playing poker but in the style of Keith Haring. Here’s how Bing Chat interpreted that prompt:
If you didn’t know any better, you might think this was a Haring original. But if Keith’s work were “glazed,” the tools would still be able to show poker-playing pooches. Just not in his signature squiggle.
Now, those same researchers have expanded on that concept with a new algorithm called “Nightshade.”
Named for a family of toxic plants, Nightshade can actually damage an AI model’s ability to generate certain concepts.
Keeping the example from above, if you prompted a poisoned AI to show dogs playing poker, it might show dogs with six legs and four eyes. If poisoned enough, it might forget dogs altogether and just show cats instead.
Below is an image from a paper the team published on their success with Nightshade.
It shows the results from a version of Stable Diffusion SD-XL that was trained on data sets that included poisoned images.
As you can see, when SD-XL was fed only 50 poisoned samples, its understanding of what a dog looks like becomes warped. The image that it generates is less “Who’s a good boy?” and more “What the **** is that?”
Though it has a sort of Lovecraftian charm, the image becomes unusable for business or personal purposes.
Up the dosage to 300 poisoned samples, and the AI is now fully convinced that a dog looks like a cat. At this level of confusion, users are likely to lose trust in the tool altogether.
So, why did adding more poisoned data result in a cat? To understand that, we need to take a very quick detour and understand how AI image generators work.
How do AI image generators work?
Most of the big-name AI image generators that are making headlines are based on what’s called a “diffusion model.”
To oversimplify, diffusion models work by breaking images down into visual noise – like static on your TV. Then, the program reverses the process and reassembles the original image like a puzzle.
Reassemble enough images, and the model starts to recognize patterns in the process. These patterns allow it to predict what an image should look like. So, after repeating this process with hundreds of images of dogs, it learns to anticipate patterns that represent “dog.”
At the same time, the model is also associating that dog-like pattern with the word “dog,” and looking for similarities in patterns between words.
That’s how the AI model learns that “dog,” “hound,” and “husky” are related concepts and have similar visual patterns.
This is where Nightshade finds a weakness.
How does the Nightshade algorithm work?
The Nightshade algorithm works by introducing what researchers call “perturbations” into an image. These are subtle changes to pixels at various layers of the treated image.
The changes aren’t large enough to be noticeable to a human, but when a diffusion model begins to break down a poisoned image, it finds pixels where they shouldn’t be. It starts learning the wrong pattern.
In our dog example, the changed pixels are designed to introduce a cat-like pattern. When the model ingests a few poisoned images, it starts to confuse the dog and cat patterns. That’s why it generated an adorable, six-legged monstrosity.
After it’s digested a few hundred Nightshade-treated patterns, the AI model fully associates the new, cat-like pattern with the word “dog.”
And because the model goes looking for similarities between words, the poisoned pattern can bleed through into related concepts. I.e.: The cat-like pattern spreads to “hound,” “husky,” and “wolf.”
What’s more, this bleed-through effect is cumulative. It builds on itself until it starts to impact concepts that weren’t specifically targeted. Eventually, the poison starts to overlap. So, the more words that are poisoned, the more the poison impacts an AI tool’s overall performance.
Here’s an image showing the impact that poisoning multiple concepts has on words that aren’t even targeted in the attack.
When researchers poisoned only 100 concepts, SD-XL began to create poor-quality images only somewhat related to the prompt. After 500 poisoned concepts, it can only offer up 90’s mall carpet patterns.
Machine Unlearning
While developers have spent billions of dollars on teaching machines how to learn, relatively little effort has gone into teaching them how to unlearn.
This means that, for practical purposes, Nightshade’s effect is permanent.
Once a model has been poisoned, developers have little they can do other than reset their AI model back to an older version.
Even for major software companies, that would mean losing vast amounts of time, money, and progress. Not to mention having to discard the entire training set, since Nightshade-treated images are currently undetectable.
So if Nightshade is so effective and freely available, what’s to stop a hacker from using it to take down these frontier AI models?
For one thing, a bad actor would have to be able to guess where a given company was going to scrape their images from.
For another, Nightshade only works during the training stage. For fully trained and completed models, there's no way to introduce the poison.
Finally, they would have to somehow submit a large number of poisoned images. In our examples above, it took relatively few images to poison a single concept but thousands of images to have a large-scale impact on performance.
It’s unlikely that AI developers would scrape that many images from a single source, let alone one that wasn’t reasonably trustworthy.
A more likely scenario would be for a software company to sample images from hundreds of well-known websites. Think Facebook, Instagram, or Getty Images.
But if enough of the artists on those websites used Nightshade regularly, their collective action would make scraping a risky choice.
And even if developers discover a way to detect poisoned images, it would be more cost-effective to simply avoid those images than to find a way to un-poison them. At that point, Nightshade becomes a sort of “do not scrape” flag.
That’s just fine for Professor Zhao and his research team. Their goal isn’t to stop image-generating AI. Instead, they’re looking to make AI developers think twice before scraping images without permission.
But while criminals are unlikely to use Nightshade, that doesn’t mean that using Nightshade isn’t a crime.
It’s one thing to make your images unusable, but is it legal to damage a training model?
Is data poisoning legal?
Since this technology is so new, there’s no direct legal precedent, so we’ll need to look for the closest comparison.
If an AI developer wanted to challenge someone for using Nightshade, they would probably try to compare it to knowingly sharing a computer virus.
In the scholarly article Law and Adversarial Machine Learning, the authors suggest that data poisoning could run afoul of 18 U.S. Code § 1030 of the U.S. Computer Fraud and Abuse Act (CFAA).
Under this Act, it’s a crime if someone “knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer.”
Well, poisoning your images does seem like transmitting information that could result in damage without authorization.
And while a “protected computer” was originally defined as one that’s “exclusively for the use of a financial institution or the United States Government,” courts have since expanded that to include nearly any device with internet access.
So the last word we need to account for is “knowingly.”
Could knowing that a developer might scrape your images be considered “knowingly causing transmission”? It would be hard to argue that you didn’t know scraping was possible, or even likely, since deterring scraping is the express use case for Nightshade.
Could knowing that damage is a possibility be considered “intentionally causing damage”? After all, artists aren’t opting into their art being scraped.
And the legal questions flow both ways.
If using Nightshade were found illegal, would that mean that artists have an obligation to make their images safe and readable to an AI model?
These sorts of questions are unlikely to be answered until they’ve worked their way through the courts. Until then, if you want to poison your enemies, you do so at your own risk.