OpenAI Admits Agentic AI May Never Be Secure

Shelly Palmer

5 months ago

OpenAI admitted yesterday that prompt injection attacks, which occur when an AI encounters malicious instructions hidden in content it processes and treats them as commands, may never be fully solved. In other words, the same access that makes agents valuable is exactly what makes them dangerous.

I spend a lot of time helping clients understand agentic risk. As Rami McCarthy, principal security researcher at Wiz, puts it, “A useful way to reason about risk in AI systems is autonomy multiplied by access.” The more your agent can do, and the more data it can reach, the higher your exposure.

Agent mode in ChatGPT Atlas allows the browser agent to view webpages and take actions, clicks, and keystrokes inside your browser, just as you would. That’s the value proposition. Security researchers responded by publishing demos showing it was possible to write a few words in Google Docs that changed the browser’s behavior. That’s the vulnerability.

OpenAI’s internal testing found worse. They demonstrated an attack where a malicious email directs the agent to send a resignation letter to the user’s CEO. When the user asks the agent to draft an out-of-office reply, the agent encounters that email, treats the injected prompt as authoritative, and follows it. The out-of-office never gets written. The agent resigns on your behalf instead.

McCarthy’s assessment is blunt: “For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile.” OpenAI’s own guidance reinforces this: “Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place.”

Agents are synthetic employees and they should be treated as such. Start with minimum necessary permissions and expand only with clear business justification. Audit what your agents can access today. Require confirmation steps for anything involving money, messages, or sensitive data. When someone proposes giving an agent broad authority over workflows, ask: what happens when it reads an email someone else wrote?

Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.