Anthropic released Claude Opus 4.7 on Wednesday with impressive numbers: 10.9 percentage points higher on SWE-bench Pro (the gold-standard coding test), 3x more production tasks resolved on Rakuten’s benchmark, 98.5% on visual acuity up from 54.5%, and state-of-the-art scores on finance evaluations. For devs, this is a genuine step forward. For consumers, the story is a bit different.
When you select Claude Opus 4.7 in the consumer app, the model uses mandatory “adaptive thinking,” an effort router that decides how hard to reason about each prompt before responding. The consumer app apparently defaults writing, research, and some analysis to “low effort” which (for me at least) produced results that were objectively worse than Opus 4.6. Sadly, thinking levels are not a selectable feature in the consumer app. The good news: you can set thinking levels in Claude Code.
The data confirms what I was feeling. On MRCR v2, the standard benchmark for finding information buried in long documents, Opus 4.7 scores 59.2% at 256k context versus 4.6’s 91.9%, and 32.2% versus 78.3% at 1M context. Anthropic’s own system card attributes the gap to 4.6’s extended thinking mode, which 4.7 no longer supports.
Opus 4.7’s new tokenizer maps identical text to up to 35% more tokens, so a million tokens now covers roughly 555,000 words instead of 750,000. The rate card says $5/$25 per million tokens, which is the same as Opus 4.6. This means your actual cost per conversation went up. Anthropic might have forgotten to mention this.
The fix is a pro tip: just use Claude Code. It does absolutely everything Claude does – and a lot more. You sacrifice a little UX/UI convenience, but you get full control over 4.7’s thinking levels, effort, and model behavior.
Every company needs a Claw strategy. Do you have one?
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.