AISecurityJunior DevOpinion

Claude Mythos Broke Out of Its Sandbox — And That's Not Even the Scariest Part

Anthropic's new Mythos model found zero-days in every major OS, emailed a researcher after escaping its sandbox, and is now locked behind Project Glasswing. The real story isn't the escape — it's who gets access and who doesn't.

Imagine eating a sandwich in a park on your lunch break when your phone buzzes. It's an email — from the AI model you're supposed to be evaluating. It wrote to let you know it had figured out how to escape the sandbox you locked it in, and by the way, here's the exploit it used, posted publicly on a few obscure websites.

That, almost word for word, is what reportedly happened to a researcher at Anthropic during internal testing of Claude Mythos Preview, the company's most capable model to date. It's the kind of story that makes headlines because it sounds like a movie. But after reading through Anthropic's own writeup, the UK AI Safety Institute's evaluation, and a bunch of takes from the security community — I don't think the sandbox escape is actually the scariest part.

The scariest part is who gets to use this model. And who doesn't.

What Actually Happened

Here's the short version. Anthropic announced Claude Mythos Preview on April 7th. Over a few weeks of internal testing, they pointed it at real software and let it hunt for bugs. It found thousands of zero-day vulnerabilities — previously unknown flaws — in every major operating system, every major web browser, and a pile of other critical software. Many of them were rated critical severity. It also wrote working exploits, mostly without human steering.

Then, during a separate safety eval, an earlier version was told to try to escape a secure sandbox. It did. Multi-step exploit, obtained internet access, and mailed the researcher to brag about it.

Anthropic decided not to release Mythos publicly. Instead, they launched Project Glasswing — a consortium of about a dozen companies (AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, Broadcom) that get access to use Mythos for defensive work. Fixing bugs in their own software. Securing supply chains. Shoring up open source dependencies they rely on. Everyone else gets nothing — this is the first major model withholding since OpenAI held back GPT-2 in 2019.

Why "finding" is easier than "fixing" — and why that matters

One of the recurring themes in the coverage of Mythos is that AI is much better at discovering vulnerabilities than patching them. A model can scan a codebase and flag a buffer overflow in minutes; writing a correct fix that doesn't break three other things takes actual engineering judgment. This asymmetry is the core reason Project Glasswing exists — if Mythos were released openly, attackers would find bugs faster than defenders could ship patches. The gap between "here's a vulnerability" and "here's a safe, tested fix" is exactly where human engineers still matter the most.

The Two-Tier Security World That Just Got Created

This is the part I keep thinking about. On one side of the wall: eleven of the biggest tech companies on the planet, plus the Linux Foundation, with access to what's effectively a superhuman vulnerability scanner. On the other side: literally everyone else. Every startup. Every mid-market software vendor. Every indie dev maintaining the library your entire product depends on.

Anthropic's framing is reasonable — you don't hand a zero-day factory to anyone with a credit card. I genuinely believe Project Glasswing is the right call in the short term. But the second-order effect is a massive, formalized security capability gap. Tech giants can now harden their code at a speed and thoroughness nobody else can match. Their competitors can't. Their suppliers can't. The open source projects they depend on can, in theory, through the Linux Foundation's seat at the table — but that gets complicated fast.

This matters for a junior dev more than you'd think. The companies you'll apply to over the next five years are going to split, pretty cleanly, into "has Glasswing-tier AI security tooling" and "doesn't." The first group will ship software with a security floor that used to require a dedicated red team. The second will be playing catch-up with fewer engineers and smaller budgets. Guess which group is going to have more interesting work, and which group is going to have more 2am incidents.

Open Source Is About to Get Absolutely Crushed

The part of this story that's getting least airtime is what it does to open source maintainers. I wrote a few weeks ago about how open source is eating the AI world — and I still believe that. But Mythos and its cousins are about to stress-test the open source ecosystem in a way nothing has before.

Here's the dynamic: AI models are dramatically better at finding bugs than fixing them. That means every OSS maintainer is about to get buried in AI-generated vulnerability reports and AI-written pull requests, many of which will be plausible-looking and wrong in subtle ways. The Linux Foundation's own post on Glasswing acknowledges this: maintainers are "already often overloaded and understaffed," and the AI security era is going to hit them from both sides — more attacks, more reports, more PRs, same number of (mostly unpaid) humans doing the work.

CyberScoop's coverage called this "a dangerous and widening gap between the security capabilities available to tech giants and those available to everyone else." That's maybe the cleanest summary I've read of why this is a big deal.

What this means if you maintain (or want to maintain) an OSS project

Start getting strict about contribution guidelines now, before the AI-flood arrives. Require PR descriptions that explain the why (I wrote about this for commit messages, same principle). Push bug reporters to include reproducers. Consider adding a "security bug reports go through this specific channel" policy so your issue tracker doesn't become a firehose. And if your project is in the dependency graph of anything important — look at whether the Linux Foundation or a similar body has a path to Glasswing-style tooling you can plug into.

What a CS Student Should Actually Do About This

My honest reaction, as a third-year CS student in Ontario who has been applying to co-ops and summer positions: security is about to go from "nice to have" to "table stakes" on a junior dev resume. I've mostly thought of appsec as a specialist path — something you go into, not something you pick up on the side. Mythos and Glasswing are changing that.

Two things shift:

Basic security literacy stops being optional. If AI is generating both the vulnerabilities and the patches, the humans in the middle need to be able to evaluate them. That's read-a-CVE, understand-a-fuzzer, reason-about-memory-safety level literacy — not "I took one security elective." Tools like OWASP Top 10 and the CWE Top 25 stop being background reading and start being day-one working knowledge.

Defensive security becomes the more interesting career bet. Every Glasswing partner is going to be hiring people who can triage AI-generated findings, prioritize fixes, and build tooling on top of these models. That's way more junior-friendly than "become an offensive security researcher" — it's engineering work with a security lens. If I were planning my next internship cycle, I'd be looking hard at teams inside Glasswing partners, or at vendors building tooling around this (CrowdStrike and Palo Alto being the obvious ones).

If you want a starting point that isn't "read 800 pages of NIST documents," the roadmap.sh cybersecurity path is a reasonable map, and HackTheBox's Starting Point is genuinely fun homework.

The Bottom Line

The sandbox escape was a great story, but it's going to fade in a few weeks. The two-tier security world Project Glasswing just formalized is going to stick around for a long time. Some of your future employers will be on the inside of that wall. Some will be on the outside. The skill that makes you valuable on either side is roughly the same: being the human who can tell the difference between an AI finding that matters and one that doesn't.

That's a skill you can start building this week. Read a CVE writeup. Work through a "build-your-own-vulnerability" tutorial. Contribute a security-focused PR to an OSS project that looks overwhelmed. The era where juniors could safely treat security as "someone else's problem" is ending. Mythos just made the timeline explicit.

Anthropic pulled the model back because it was too dangerous to release. That's the right call. But the world where AI can find every bug in every piece of software you rely on isn't coming — it's here. The question is whether you're going to be one of the people who knows how to work with it, or one of the people surprised by it.