I’ve spent the better part of the past four days putting OpenAI’s o1 models through their paces. For those who aren’t following the foundational AI model wars, last week, OpenAI released a new series of AI models called o1, which the company says is the first in a “new series of reasoning models for solving hard problems.” I wrote about it the day after it debuted. As far as I can tell, this is the first iteration of the rumored project “Strawberry.” In a word, “Wow!” o1 is absolutely next level. What it does, it does better than anything else I’ve tried. But it’s not for everyone. In fact, there’s a very good chance it’s not for you. Here’s why.
OpenAI’s o1 models, including o1-preview and o1-mini, build upon GPT-4’s foundation, offering enhanced performance across various tasks. o1-preview demonstrates substantially better reasoning than GPT-4, particularly in complex problem-solving and analytical thinking.
Cost
o1 models handle longer context windows, enabling more comprehensive text understanding and generation. This capability allows for processing larger documents, making them valuable for extensive data analysis and document summarization. However, these enhancements come at a cost. The new models require significantly more computational resources, with operational costs estimated up to 10 times that of GPT-4. One company we work with is spending about $60,000 per month on GPT-4. To do the tasks with o1 would cost approximately $3,000 per hour – which is completely out of the question considering that, for all of their capabilities, o1 models operate at slower inference speeds, presenting a serious trade-off between improved reasoning and processing time.
Safety
According to OpenAI, they have prioritized safety in o1 models, implementing advanced content filtering and bias mitigation techniques. These features aim to reduce the generation of harmful or inappropriate content, making the models more reliable for business applications. The company continues to engage with ethicists and industry experts to address potential societal impacts of these advanced models. I have no way of knowing whether this is true or not. I did try some lesser-known jailbreaking techniques on o1-preview (for research purposes only) and I was not successful in tricking the model into doing my bidding. That said, I am not the authority on acting as a bad actor. I’m sure the professional bad guys will figure something out. (They always do).
Capabilities
The o1 models offer potential applications across every industry and job function. Financial institutions can use them for advanced market analysis and risk assessment. Healthcare providers can leverage them for medical research and patient data analysis. Legal firms can employ these models for contract analysis and case research. Creative industries can utilize their enhanced content generation capabilities, etc. But what o1 really does is code. o1-preview codes so well, I’m not sure what it will mean to be a software engineer in five years – maybe less. This thing writes Python, React, Javascript, far better than I do. Everything needs to be debugged and it’s far from perfect on the first try, but the results are mind-blowing. It creates websites, casual video games, and does a passable job with simple apps. It also reasons through word problems at a very high level. This is the worst it will ever be – and it’s already amazing.
Why Do You Say It’s Not For Me?
The o1 models are crazy expensive to run. So, if you’re doing anything other than a POC or an MVP, you have to know it’s not yet feasible to deploy at scale. The models are very slow compared to GPT-4. This is fine if your use cases are not time sensitive, but you’re not going to create a pretty UX, tap OpenAI’s API, and resell this in an app for normal people – IDK any consumer who will wait 25-45 seconds for a response from any digital device under most circumstances.
Right now, GPT-4o is good enough for most of the things that both models do and it’s way less expensive to run. Also, there are open source models from Meta, Mistral, and others that are basically free. So, you have to have a specific use case to justify the time and expense of o1.
A Whole New World
This is the first version of the o1 models. When they become multi-modal and faster and cheaper, they will represent a new class of agentic AI that can reason through a problem and solve it. So, do not wait to start experimenting with the o1 models. By the time you figure out how to use them, they will be ready for primetime.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.