Yesterday, OpenAI launched ChatGPT agent, which combines everything OpenAI has learned about autonomous systems. It can click through websites like Operator; synthesize research like Deep Research; and access your Gmail, GitHub, and calendar to complete real tasks. It’s available today for Pro, Plus, and Team subscribers through a simple dropdown menu selection.
According to OpenAI, ChatGPT agent scores 41.6% on Humanity’s Last Exam, which is double what o3 and o4-mini achieved. On FrontierMath, one of the hardest math benchmarks around, it hits 27.4% with tool access, crushing o4-mini’s 6.3%. These are impressive numbers, but remember: they’re grading their own homework.
Silicon Valley has gone “agent crazy.” Google has them. Anthropic has them. Microsoft’s Copilot is evolving into one. OpenAI is hoping to win the agent wars by integrating directly with your current workflows. The company says ChatGPT agent can generate presentations, run code through its terminal access, and use APIs to connect with your apps. In success, it will become a synthetic intern.
Until now, the most useful agentic workflows were ones that orchestrated agents that do exceptionally simple tasks: read an email, write a response (based on some predetermined criteria), and put it in my drafts folder. OpenAI says its new agent is capable of much more.
If you’re a ChatGPT user and you get access to ChatGPT agent, please share your experience with me. It’s one thing to summarize an email, but OpenAI is promising “next level” productivity. I’ll let you be the judge.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.