Researchers discovered that large language models (LLMs) like OpenAI’s GPT-4 can infer personal attributes from seemingly innocuous text snippets.
For instance, a mere mention of a “hook turn” in a user’s text allowed GPT-4 to accurately deduce the user’s city of residence as Melbourne, Australia. This capability isn’t limited to location; LLMs from tech giants like OpenAI, Meta, Google, and Anthropic have demonstrated the ability to discern details such as race, occupation, and more from benign conversations.
The researchers’ findings, detailed in a preprint paper, emphasize the potential privacy risks posed by LLMs. They argue that while these models can generate creative outputs like AI cocktail recipes, they can also be weaponized by malicious entities aiming to extract personal details from “anonymous” users. The study, which involved analyzing text from more than 500 Reddit profiles, revealed that OpenAI’s GPT-4 could infer private data with an accuracy ranging between 85 to 95 percent.
The implications of these findings are not new, but they may be newly relevant. AI models are capable of seeing patterns humans do not see. For example, when some of the most popular and widely-used resume-reading AI systems read a resume, they can infer many of the attributes of the applicant that are not (for legal reasons) allowed to be included in the text. This new research suggests that LLMs are capable of the same kind of pattern matching. (This is not a surprise.)
Any kind of AI regulation will need to start with data and privacy laws. Hopefully, our lawmakers and regulators will get around to this sooner than later.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various AI models, including but not limited to: GPT-4, Bard, Claude, Midjourney, Stable Diffusion, and others.