Generative AI requires vast amounts of training, and many of the most popular large language models have been trained by scraping publicly available websites. This practice didn’t seem to bother many people when it was used by Google (and other search engines) to index pages and surface them on search results pages, but now there are numerous lawsuits focused on whether or not copyrighted material can be used for AI training without the knowledge, consent, and remuneration to the rights holders.
Writers, musicians, and artists of all kinds are (understandably) the most vocal about this issue. To help them, the University of Chicago introduced “Nightshade,” an open-source tool that allows content creators to subtly alter their work. These modifications, invisible to humans, mislead AI models. For instance, exposure to images “poisoned” by Nightshade led AI to misidentify dogs as cats.
Originally an extension of the university’s Glaze tool, Nightshade stands out by intentionally misleading AI about object and scene identifications. Its integration presents a significant challenge to AI developers. Detecting and removing these poisoned pixels is difficult, and if AI is trained on such data, retraining becomes necessary.
The creators of Nightshade say they aim to restore the balance of power, ensuring artists’ intellectual property is respected in the age of AI advancement.
That’s the intended use. The unintended consequences are so extreme that I’m sad I’m being made to think about them.
It is already extremely difficult and expensive to train a large language model. Retraining and fine tuning models are also resource intensive. Let’s imagine that a bad actor wanted to start poisoning publicly available content by hacking into content management systems or digital asset management systems (which, generally speaking, are not the most secure servers on the web).
Take it a step further. Let your mind go as far as you’re comfortable with thinking about how you might use intentionally (but invisibly) mislabeled files on the web.
We are approaching the one-year anniversary of the introduction of ChatGPT, the most popular consumer-grade generative AI platform. Microsoft, Google, Salesforce, Adobe, and almost every other tech company is “all in” on AI. Welcome to the beginning of a new information security arms race.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various AI models, including but not limited to: GPT-4, Bard, Claude, Midjourney, Stable Diffusion, and others.