In partnership with

Anthropic just dropped Claude Opus 4.7.

Opus 4.7 handles complex, long-running tasks with more rigor, pays precise attention to instructions, and verifies its own outputs before reporting back. The coding gains are real: on the standard coding benchmark, Opus 4.7 scores 64.3%, up from 53.4% for Opus 4.6 and ahead of GPT-5.4 at 57.7%. It also sees images at over three times the resolution of previous Claude models, which matters for reading dense screenshots and diagrams.

Anthropic also introduced finer controls over how hard the model thinks on difficult problems, and a new way for developers to cap costs on long automated tasks. Worth noting: a new tokenizer can map the same text to up to 35% more tokens, meaning actual costs per request may rise even though the per-token price has not changed. Some users are already pushing back on this.

The reaction has been mixed. Coding improvements are getting strong reviews, but a widely shared post joked that Anthropic quietly degraded Opus 4.6 just to relaunch it as 4.7. It is worth acknowledging there is some truth to the frustration. The benchmarks show 4.7 trailing 4.6 in agentic search, and in one particularly viral moment, users discovered that Opus 4.6 passes the Ishihara colorblindness test while Opus 4.7 does not.

Anthropic pushed back on the broader conspiracy, saying the model genuinely does more thinking by default and increased rate limits across the board to compensate.

The elephant in the room is Mythos. Anthropic publicly acknowledged that Opus 4.7 does not match it. Opus 4.7 is the best model anyone can get their hands on today. Whether Mythos ever becomes widely available remains an open question.

One hour after Anthropic’s drop, OpenAI updated Codex.

The timing was not subtle. While Anthropic was shipping a more powerful model, OpenAI was shipping a more capable product. The distinction matters.

Codex, OpenAI's coding agent, got a significant update today. It can now use your Mac in the background, clicking and typing with its own cursor while you keep working, without taking over your screen. It can generate images, remember context across sessions, schedule future tasks and wake itself up to continue long-running work, and now connects to over 90 plugins including Atlassian, CircleCI, and the full Microsoft Suite.

The number worth paying attention to is this: OpenAI says Codex has 3 million weekly users, and nearly half of that usage is already non-coding work. That is the real signal. OpenAI is not positioning Codex as a developer tool anymore. They are positioning it as a general work agent, something that sits alongside you all day across your entire workflow, not just when you are writing code.

That is a meaningfully different bet than Anthropic's. Opus 4.7 pushes the frontier on what a model can do. Codex pushes the frontier on what an agent does all day. Claude Code stays focused on software. Codex wants your whole workflow.

The week's scoreboard, as one person put it: frontier capability to Anthropic, product surface area and distribution to OpenAI. Spud, OpenAI's next major model, is reportedly dropping soon. When it does, OpenAI will be trying to claim both.

Smart starts here.

You don't have to read everything — just the right thing. 1440's daily newsletter distills the day's biggest stories from 100+ sources into one quick, 5-minute read. It's the fastest way to stay sharp, sound informed, and actually understand what's happening in the world. Join 4.5 million readers who start their day the smart way.

OpenAI also launched an AI built specifically for scientists.

GPT-Rosalind is a new model from OpenAI designed for biology, chemistry, and protein research. It ranked in the top 5% of human experts on RNA prediction tasks and is already being used by Amgen, Moderna, and the Allen Institute through a free research preview.

The framing OpenAI is using is simple: a new drug takes 10 to 15 years to go from discovery to approval. AI that helps researchers explore more possibilities and surface connections faster could meaningfully compress that timeline. It is an ambitious claim, and one that is hard to verify. But if it even partially delivers, the implications are significant.

Make sure to check out our latest video!

Keep Reading