OpenAI Admits Agentic AI May Never Be Secure

OpenAI admitted yesterday that prompt injection attacks, which occur when an AI encounters malicious instructions hidden in content it processes and treats them as commands, may never be fully solved. In other words, the same access that makes agents valuable is exactly what makes them dangerous.

I spend a lot of time helping clients understand agentic risk. As Rami McCarthy, principal security researcher at Wiz, puts it, “A useful way to reason about risk in AI systems is autonomy multiplied by access.” The more your agent can do, and the more data it can reach, the higher your exposure.

Agent mode in ChatGPT Atlas allows the browser agent to view webpages and take actions, clicks, and keystrokes inside your browser, just as you would. That’s the value proposition. Security researchers responded by publishing demos showing it was possible to write a few words in Google Docs that changed the browser’s behavior. That’s the vulnerability.

OpenAI’s internal testing found worse. They demonstrated an attack where a malicious email directs the agent to send a resignation letter to the user’s CEO. When the user asks the agent to draft an out-of-office reply, the agent encounters that email, treats the injected prompt as authoritative, and follows it. The out-of-office never gets written. The agent resigns on your behalf instead.

McCarthy’s assessment is blunt: “For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile.” OpenAI’s own guidance reinforces this: “Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place.”

Agents are synthetic employees and they should be treated as such. Start with minimum necessary permissions and expand only with clear business justification. Audit what your agents can access today. Require confirmation steps for anything involving money, messages, or sensitive data. When someone proposes giving an agent broad authority over workflows, ask: what happens when it reads an email someone else wrote?

Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.

About Shelly Palmer

Shelly Palmer is the Professor of Advanced Media in Residence at Syracuse University’s S.I. Newhouse School of Public Communications and CEO of The Palmer Group, a consulting practice that helps Fortune 500 companies with technology, media and marketing. Named LinkedIn’s “Top Voice in Technology,” he covers tech and business for Good Day New York, is a regular commentator on CNN and writes a popular daily business blog. He's a bestselling author, and the creator of the popular, free online course, Generative AI for Execs. Follow @shellypalmer or visit shellypalmer.com.

Tags

Categories

PreviousAn AI December to Remember NextAI Brought Grayson Home for Christmas

Get Briefed Every Day!

Subscribe to my daily newsletter featuring current events and the top stories in AI, technology, media, and marketing.

Subscribe