Let’s talk about the new ChatGPT powered by GPT-4o. There are some things it does extremely well. If you drag a dense spreadsheet of employee data into the chat window and ask the model what you can learn about the file, you’re going to get some good results. Give the model some variables in a well-crafted pre-prompt, and you’ll get basically the same results you’d expect from GPT-4, but a bit faster. That’s where the party ends.
Drag and drop a well-structured 70 page document with numbered paragraphs and subparagraphs (like a contract) into the chat window and ask it to analyze the document and (more likely than not) you’re going to get nonsense. It will return completely made up quotes with reference numbers that don’t match the original document. Worse, the model will offer up some references to material that is not even in the original document. This is so problematic that you’re better off reading the original document yourself; even when you are able to coax ChatGPT into giving you something that looks right, you’re still going to have to check every single word.
Oddly, while this same issue happens with ChatGPT powered by the original GPT-4 (which is still available via menu selection), the older model is still much better at this kind of task.
Here’s the thing. These kinds of hallucinations are not unique to OpenAI’s models. Google is also having a world of issues with its AI Overviews search solution; its misinformation ratio is way too high for it to be considered a reliable source.
Today’s message is simple — check your work! We may be in the “Oops, the AI models are a bit ahead of their skis” phase of the generative AI revolution.
Have you had a similar experience in the past week or so?
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.