In partnership with

Meta just dropped its most powerful AI model ever.

Muse Spark launched today, the first model out of Meta Superintelligence Labs, the division Meta quietly built from scratch over the past nine months after aggressively poaching researchers from across the industry. This is not a Llama update. Meta rebuilt their entire AI training stack from the ground up and Muse Spark is what came out the other side.

The model is natively multimodal, meaning it understands and reasons across text and images together rather than treating them as separate tasks. It comes in two modes: Instant for fast everyday responses, and Thinking for deeper reasoning on harder problems. There is also a new Contemplating mode that runs multiple agents reasoning in parallel simultaneously, Meta's answer to the extended thinking modes that Gemini Deep Think and GPT Pro have been getting attention for.

The benchmark numbers are worth paying attention to. Muse Spark beats Opus 4.6 on most multimodal tasks, beats GPT 5.4 on health benchmarks by a significant margin, scores 58.4% on Humanity's Last Exam with tools which puts it ahead of Gemini 3.1 Deep Think on that specific test, and leads the field on scientific research benchmarks. The one area where it trails is pure reasoning tasks like ARC-AGI 2, where Gemini and GPT still hold an edge. Meta is transparent about that gap and says they are actively investing in it.

The health angle is a genuine differentiator. Meta collaborated with over 1,000 physicians to curate training data specifically for health-related queries, and the model can generate interactive visual breakdowns of nutritional content, exercise form, muscle activation, and more. It is a direct play for the kind of personal health assistant use case that no other frontier lab has prioritized at this level.

Meta confirmed that larger models are already in development, meaning this is the bottom of a new scaling ladder, not the top.

The Zucc is back, and this time he brought receipts.

Anthropic just made it dramatically easier to build and deploy AI agents.

Claude Managed Agents launched today in public beta. The idea is straightforward: until now, building a production-ready AI agent meant months of infrastructure work before you shipped anything users could actually see. Secure code execution, memory management, authentication, error handling, all of it built from scratch. Most teams were spending more time on plumbing than on the actual product.

Managed Agents removes that bottleneck. You define what you want the agent to do, what tools it has access to, and what guardrails it operates within. Anthropic runs it on their infrastructure. Agents can work autonomously for hours, pick up where they left off after a disconnection, and spin up multiple parallel agents to tackle complex tasks.

Early customers are already shipping things that would have taken months before. Notion lets teams delegate open-ended work without leaving their workspace. Sentry built a flow where developers go from a flagged bug straight to a Claude-written fix and open pull request. Rakuten deployed specialist agents across sales, finance, and HR in under a week each.

The timing is not subtle. Four days after blocking OpenClaw from Claude subscriptions, Anthropic launched their own managed agents platform. Make of that what you will.

The Tech newsletter for Engineers who want to stay ahead

Tech moves fast, but you're still playing catch-up?

That's exactly why 200K+ engineers working at Google, Meta, and Apple read The Code twice a week.

Here's what you get:

  • Curated tech news that shapes your career - Filtered from thousands of sources so you know what's coming 6 months early.

  • Practical resources you can use immediately - Real tutorials and tools that solve actual engineering problems.

  • Research papers and insights decoded - We break down complex tech so you understand what matters.

All delivered twice a week in just 2 short emails.

OpenAI's Spud might be dropping very soon.

Rumors are circulating that OpenAI is accelerating the launch of Spud, their next major model, in direct response to the pressure created by Anthropic's Mythos release. The speculation points to a launch as early as tomorrow, available across all paid tiers, and reportedly performing at a comparable level to Mythos across the board.

The context makes the urgency easy to understand. On SWE-bench Pro, the leading coding benchmark, Mythos Preview sits at 77.8%. GPT-5.4 (OpenAI's current best) is at 57.7%. Every other model in the field is in the low 40s. That is not a gap, it is a chasm, and it explains why Spud may be arriving sooner than anyone expected.

Nothing is confirmed yet. But if Mythos rattled OpenAI enough to move up their timeline, tomorrow could be a very interesting day.

Make sure to check out our latest video!

Keep Reading