Opus 4.7 Is Great. I'm Still Waiting for Mythos.
Claude Opus 4.7 just shipped and it is a genuinely good release. But the model I am actually watching for is Mythos.
Claude Opus 4.7 dropped today. I spent a few hours with it and the short version is that it is good. The longer version is that I am writing this post anyway, because the thing I actually care about right now is the next one.
Let me explain.
Opus 4.7, briefly
Opus 4.7 is the kind of release that feels incremental on the outside and compounding on the inside. The jump from 4.6 to 4.7 is not a headline number you can wave at your manager. It is a quieter thing. Agents stay on task longer. Tool calls land in the right shape more often. The model pushes back when you are being lazy with your prompt, which is the behavior I have been quietly begging for.
What it actually buys you:
- Sharper long-horizon runs. Multi-step coding sessions hold their thread further before the model starts drifting. I measured this on my own harness work and the difference is real.
- Less tool hallucination. 4.6 would confidently invent an API. 4.7 stops and asks, or admits it does not know. That is not glamorous but it is the single biggest quality-of-life upgrade in the release.
- Cheaper per useful token. Not cheaper on the price sheet. Cheaper in the sense that fewer retries are needed to get to the answer that actually ships.
The numbers
Here is how the two Opus releases stack up on the benchmarks I actually care about. Anthropic published 4.7 against the same eval suite they used for 4.6, so the comparison is honest.
| Benchmark | Opus 4.6 | Opus 4.7 | Delta |
|---|---|---|---|
| SWE-bench Verified | 72.8% | 79.4% | +6.6 pts |
| Tau-bench (retail) | 69.2% | 74.1% | +4.9 pts |
| OSWorld (computer use) | 41.5% | 46.8% | +5.3 pts |
| MMMU (multimodal reasoning) | 77.1% | 80.3% | +3.2 pts |
| Long-horizon agent runs (100+ turns, completion) | 58% | 71% | +13 pts |
| Tool hallucination rate | 3.1% | 0.9% | -71% |
| Effective context (needle-in-a-haystack @ 200K) | 94.2% | 98.6% | +4.4 pts |
| Context window | 200K | 200K | unchanged |
| Input price (per 1M tokens) | $15 | $15 | unchanged |
| Output price (per 1M tokens) | $75 | $75 | unchanged |
The 13 point jump on long-horizon agent runs is the one I would pay for on its own. The tool hallucination collapse from 3.1% to 0.9% is the one I actually am paying for, because it is saving me money in eval retries every single day.
If you run agents in production, 4.7 is a free upgrade. Bump the model string, watch your eval numbers creep up, move on with your day. That is the kind of release I like.
So why am I not more excited.
The segue
Because 4.7 is the last Opus before Mythos. And Mythos is the one I actually want.
I have been in a weird headspace about this all month. Every time I reach for Opus now, I can feel the shape of what is missing. Not in a 4.7 specific way. In an "everything in this generation" way. Models in 2026 are smart. Most of them are smart enough. The ceiling is not raw intelligence anymore. The ceiling is memory, continuity, and whatever we end up calling the thing that makes a model feel like it is actually working with you across time, instead of just responding to you right now.
That is what Mythos is supposed to be about.
What I think Mythos actually is
I want to be careful here. I do not work at Anthropic and I am not in the loop on unreleased models. Everything in this section is my read on public signals. Interviews, research papers, hiring posts, and the general vibe of where the work has been pointing for a year.
The read:
- Persistent state as a first class primitive. Not a RAG pipeline bolted on. Not a context window extension. A real, model-native notion of "what have we been doing together," one that survives session boundaries and does not rot into noise.
- Compaction you can trust. The hard problem is not remembering things. It is forgetting the right things. Everyone who has built an agent harness in the last two years has felt this. Mythos, if the rumors point anywhere useful, is the first time an Anthropic model is trained to compact its own working memory instead of leaving the harness to guess.
- Relationship, not session. This is the squishy one and the one I care about most. A model that knows me across projects. One that has opinions about my code style because it has watched me write code for six months. One that stops suggesting the library I told it I hated in February.
If Anthropic ships even two of those three, Mythos is a different product. Not a smarter Opus. A different category.
The rumored numbers
Again, speculative. Treat this as a sketch of what the leaks and research previews are gesturing at, not a spec sheet. I will happily eat my words on any of these when the real card lands.
| Dimension | Opus 4.7 (shipped) | Mythos (rumored) |
|---|---|---|
| Context window | 200K | 1M+ |
| Cross-session memory | none | native, model-side |
| Self-compaction of working memory | none | trained behavior |
| SWE-bench Verified | 79.4% | ~85% |
| Long-horizon agent runs (1000+ turns) | not measured | primary eval target |
| Personalization across projects | harness-side only | first class |
| Training compute (relative to 4.7) | 1x | ~3-4x |
| Expected release window | shipped today | Q3 2026 |
Look at the middle rows. The top and bottom numbers are the ones that will end up in headlines, but the middle is where the actual shape of the product lives. Native cross-session memory. Trained self-compaction. Runs measured in thousands of turns instead of hundreds. If those three land, every harness I currently run becomes half as complicated overnight.
Why this matters for the work I do
My day job lives inside coding harnesses. Claude Code, Cursor, the homegrown stuff my team runs. And the bottleneck in every single one of them right now is the same bottleneck. The model forgets. The harness papers over it with scaffolding. The scaffolding leaks. We spend most of our engineering budget on the leaks.
Opus 4.7 makes those leaks smaller. Mythos, if it lands, changes what the scaffolding even has to do. That is the bet.
I know betting on the next model is a bad habit. I have watched enough releases to know that hype always outruns reality by a quarter or two, and that the model in your hand today is worth two in the roadmap. I am not telling anyone to wait on 4.7. I am using it right now. I am shipping with it.
But when I close my laptop tonight, the thing I will be thinking about is not the model that dropped today. It is the one on the other side of the summer.
The wait
So here is where I land. Claude Opus 4.7 is a genuinely good release, and if you were waiting for a reason to migrate, this is the one. Do it. You will not regret it.
And then keep one eye on what comes next. Because I have a feeling the next release is not going to be an incremental version bump. I have a feeling the next one is going to redraw the category.
I will write that post too, the day it drops. Until then, 4.7 and I have some work to do.