New Tool Could Poison DALL-E and Other AI to Help Artists

2023-10-28 01:49

Image-generating AI seems to be stuck between a rock and a hard place. To work

Image-generating AI seems to be stuck between a rock and a hard place. To work well, it needs a massive treasure trove of well-annotated artwork to train it. That could get expensive, but it's free if you take it from the internet without asking. That latter bit has artists understandably upset, and the new Nightshade tool might give them the means to fight back.

Besides Shutterstock and Adobe, which have access to a large set of licensed artwork, most image-generating AI scrapes data from the internet to build learning models. It's a "better to ask forgiveness than permission" way of doing things, only without even asking forgiveness. But without that process, DALL-E, Stable Diffusion, Midjourney, and others wouldn't work as well as they currently do.

That leads to an ethical question: is it okay to use someone's art to train AI art generators without their permission? And if so, where is the line drawn? Sometimes, these data sets are for pure research, with no profit motive. In others, the intent is purely commercial, which means a company will benefit from an artist's work without ever compensating the artist.

Unfortunately for an artist who doesn't want their work used for AI at all, opting out is difficult. Not all companies have an opt-out option (such as Meta, despite earlier reports otherwise), and those that do will often promise to only remove the art from future learning models, not those that exist already.

That's where Nightshade comes in. As first reported by MIT Technology Review, Nightshade builds on earlier work by the University of Chicago to give artists a choice in how their artwork is used. The original version, Glaze, hid art styles from AI. While it may recognize that the art piece is a dog, for instance, it may incorrectly recognize the dog as drawn in anime instead of impressionism.

Nightshade goes a step further. The whole process works by making minute changes to pixels in the artwork. These changes are invisible to the human eye but detected by AI. The altered pixels corrupt the AI's understanding of the artwork by giving it a new word association with the subject. When it looks at an image of a dog, it misidentifies it as a cat.

Researchers tested the process by uploading altered images to Stable Diffusion and asking for similar generated art. Within 50 uploads, the results started to show. Instead of the perfect dog image, Stable Diffusion created distorted pictures with cartoonish faces or too many limbs. After 300 poisoned samples, Stable Diffusion produced perfect cats when prompted for dogs. Because generative AI makes connections between similar words, those corrupted images led to the same results with "husky," "puppy," and "wolf."

This attack will likely be challenging to defend against. The companies behind the generator must find the poisoned images and remove them one by one from the dataset. But having a human look at them won't work since the pictures don't appear different to the naked eye. And software detection may be equally difficult, given the nature of the attack.

The researchers plan to open-source this tool so that others can expand and improve on the work. With multiple versions going, the work maybe more effective. But in many ways, the damage is done: at best, this can hurt future learning models. But it won't break existing models that are already trained.