Google has rolled out a new experimental reasoning AI model: Gemini 2.0 Flash Thinking Experimental – the name kind of rolls off the tip of your tongue, doesn’t it? – which is now available on its AI prototyping platform, AI Studio. Described as “best for multimodal understanding, reasoning, and coding,” the model is designed to tackle complex challenges in programming, math, and physics. It builds on the recently announced Gemini 2.0 Flash model and introduces reasoning capabilities aimed at improving AI accuracy and decision-making through self-fact-checking.
Reasoning models, like Gemini 2.0 Flash Thinking Experimental, approach problem-solving differently than traditional large language models (LLMs). Given a prompt, these models pause to consider related questions and explain their reasoning before summarizing what they believe to be the most accurate response. Jeff Dean, Chief Scientist for Google DeepMind, noted that this model was “trained to use thoughts to strengthen its reasoning,” and said the company sees promising results when increasing inference time computation—essentially giving the model more time and resources to process queries.
The trade-off? Time. Unlike conventional LLMs, reasoning models often take significantly longer to respond—sometimes seconds or even minutes.
How good is Gemini 2.0 Flash Thinking Experimental compared to OpenAI o1? Forgetting all of the benchmarks and technical tests, there’s a test anyone can perform to see if a model is just a fancified LLM or a true reasoning engine: ask it how many R’s are in the word “strawberry.” An LLM, such as OpenAI’s GPT-4o or Gemini 2.0) will incorrectly answer “two” because they are basically word calculators and don’t have any idea what the inputs or outputs mean. A reasoning engine, specifically OpenAI’s o1, will correctly answer “three,” ostensibly because it “reasons” through the problem to arrive at the correct answer. Gemini 2.0 Flash Thinking Experimental answered “two.” Oops!
Not to worry, though: according to The Information, Google has more than 200 researchers working on the model.
What’s driving this sudden surge in reasoning models? For one, traditional “brute force” scaling of generative AI – throwing more data and compute at models – is yielding diminishing returns. Reasoning models are a trendy approach to improving AI accuracy, particularly in specialized applications like multimodal problem-solving. I say “trendy” because questions remain about their scalability, cost-effectiveness, and long-term viability. These models demand immense computing power, raising concerns about whether they can sustain their benchmark performance.
That said, reasoning engines are a step forward in AI evolution, with the potential to redefine how we think about machine intelligence. But as the strawberry example shows, even the smartest systems sometimes still need to spell things out—literally.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.