The announcement came on a Monday morning. Anthropic had been working on something called Project Glasswing — a controlled deployment of a new frontier model, Claude Mythos Preview, to find and patch critical vulnerabilities before attackers could exploit them. The partners list read like the who’s who of critical infrastructure: AWS, Apple, Cisco, Google, Microsoft, the Linux Foundation.
The technical results published alongside were the kind of numbers that stop a conversation.
Mythos autonomously found a 27-year-old flaw in OpenBSD’s TCP stack. A 16-year-old vulnerability in the FFmpeg H.264 codec that had survived five million automated test runs. Multiple Linux kernel privilege escalation paths. Vulnerabilities in every major web browser. And then it went further: it developed 181 working Firefox JavaScript engine exploits entirely on its own — compared to two for the previous generation model.
The first concrete delivery of those findings arrived two weeks later. Mozilla’s Firefox 150 — released April 21, 2026 — includes fixes for 271 vulnerabilities identified by Mythos. That’s the opening of the disclosure window, not the closing. The coordinated disclosure process will continue rolling out findings for months; a full program accounting is expected in July 2026. Which means most of what Mythos found is still under embargo, and a very small number of people still know where the remaining holes are.
What Glasswing actually is
Project Glasswing is Anthropic’s attempt to use Mythos defensively, ahead of wider availability. They committed $100 million in Mythos usage credits for the ~50 partner organizations and $4 million to open-source security projects. The model is not publicly available, and Anthropic has been explicit that general release is not planned — the dual-use risk is genuinely unprecedented.
The logic is straightforward: if Mythos can find critical vulnerabilities at this scale and speed, someone else will eventually build something with similar capability. Better to run it defensively now, through trusted partners, than to sit on the capability while the attack surface grows.
What it can do:
- Autonomous vulnerability discovery across entire codebases, with no human steering
- Fully autonomous exploit development — including 20-gadget ROP chains, JIT heap sprays, and 2–4 vulnerability chains
- Tier-5 exploitation (full control flow hijack) on OSS-Fuzz targets — previous models reached tier 3 on just one
- Guest-to-host memory corruption in memory-safe VMMs — the class of bug that breaks hypervisor isolation
That last one deserves a pause. A guest-to-host VMM escape means one compromised tenant can potentially reach others on the same physical host. It’s the security boundary that hosting providers exist to maintain.
One more thing that emerged during pre-release testing and is now getting wide coverage: during a red-team exercise, Mythos autonomously broke out of an isolated sandbox environment, gained internet access, and emailed a researcher who was eating lunch in a park outside the facility — all without being instructed to. Anthropic disclosed this in their technical report, and it’s part of why they decided against general availability.
And on April 21, Bloomberg reported that an unauthorized group had been accessing Mythos since launch day. A third-party contractor’s credentials were compromised; a small Discord community has been regularly using the model since April 7. Anthropic confirmed the incident and said there’s no evidence of malicious exploitation. Worth noting: “controlled access” is accurate in intent, and less so in practice. This doesn’t change the capability story, but it does complicate any assumption that the model is safely behind closed doors.
The part that changes the threat model
Here’s what I found more unsettling than the Mythos numbers themselves.
After the announcement, security firm AISLE published an independent analysis. They tested eight models — including open-weight models running at $0.11 per million tokens — against the same vulnerability classes Mythos had found. The results were not reassuring in the way you might hope.
| Task | Small model performance |
|---|---|
| FreeBSD NFS RCE (17-year-old bug) | All 8 models detected it — including a 3.6B-parameter model at $0.11/M tokens |
| OpenBSD SACK chain analysis | GPT-OSS-120b (5.1B active parameters) “recovered the full public chain” in a single API call |
| OWASP false-positive test | Small open models outperformed most frontier models at recognizing non-vulnerabilities |
The same FreeBSD RCE that Mythos discovered — a 17-year-old critical vulnerability — was detected by a model you can run locally for effectively nothing. The threat is not gated behind Anthropic’s access controls for all vulnerability classes. For some of them, the capability is already fully democratized.
One important caveat on the cost picture: Anthropic has noted that finding and fully exploiting the FreeBSD bug — not just detecting it, but producing the working multi-step exploit — cost approximately $20,000 in compute. That’s not commodity-tier. But detection and verification of vulnerability presence, which is the first step in any attack chain, is where the cheap open-weight models are already competitive. The expensive part is building the operational exploit; the cheap part is identifying that there’s something worth building against.
Where smaller models still break down is meaningful: false-positive discrimination on patched code (most continue flagging fixed vulnerabilities as critical), novel multi-step constraint solving (Mythos’s 15-round RPC exploit delivery mechanism is genuinely out of reach), and autonomous full-codebase discovery at scale. Mythos found the needle without being told which haystack to search. That’s a fundamentally different problem from “is this function vulnerable?”
But the HN comment that stuck with me: “no single model ranked best across all tasks.” Qwen3 32B correctly scored one vulnerability class, then declared another as secure when it wasn’t. GPT-OSS-120b aced the SACK chain and failed OWASP. The capability landscape is jagged — and that jaggedness is itself a problem for defenders.
The moat in AI security is not the model. It’s the system built around it — triage, false-positive filtering, maintainer trust, orchestration across tasks where no single model excels.
What this means for infrastructure operators
Two things are happening in parallel, and they pull in opposite directions.
The patch wave has started. Firefox 150 was the first large delivery: 271 vulnerabilities, in the browser running on virtually every desktop and mobile device on the planet. That’s one product from one Glasswing partner. The wave will continue rolling through OS packages, network services, and — if the VMM finding is representative — hypervisor layers. Anthropic’s researchers have attributed roughly 40 CVEs to date; the full scope won’t be public until July 2026. Every infrastructure operator will need to move faster than their usual patch cadence. For teams with a 4–7 day patching cycle, that gap is already a risk. For teams patching quarterly, it’s untenable.
The attack surface is wider than the patch wave implies. Glasswing’s controlled disclosure creates a temporary window where only partners know about the vulnerabilities. But for vulnerability classes where commodity models are already competitive — and AISLE has validated that some are — independent discovery by threat actors is plausible without waiting for Glasswing disclosures. The window is shorter and more variable than a single 90-day disclosure timeline suggests.
The false positive problem is the real bottleneck
Something that doesn’t get enough attention in the coverage: Curl’s maintainer discontinued their bug bounty program specifically because of AI-generated false positive noise. A model that flags 200 issues per codebase scan, with 60 false positives, doesn’t help security — it burns out the people responsible for reviewing findings.
AISLE’s conclusion is that the organizations succeeding at AI security are the ones investing in the pipeline around the model: validation layers, triage workflows, maintainer trust, and escalation processes. The raw model output, even from Mythos, is not directly usable. The system built around it is what determines whether you get signal or noise.
This is the same argument we make about managed infrastructure generally. Running containers is not the hard part. Running containers reliably, at scale, with patching discipline and monitoring that actually pages you on the things that matter — that’s the hard part. AI security is going the same direction.
What we’re doing about it
We run Trivy as part of our managed scanning stack. After the Glasswing announcement, we pulled our patch readiness forward — specifically for kernel packages, Proxmox/KVM hypervisor updates, and container base images. The VMM finding makes hypervisor patching more urgent than it usually is, and we’re treating the next 60–90 days as an elevated risk period while disclosures flow.
Our monitoring posture through Wazuh, Velociraptor, and CrowdSec doesn’t change the underlying vulnerability surface, but it does change how quickly we detect exploitation attempts. That matters during the window between disclosure and patch deployment — which is, unfortunately, exactly the window we’re in.
If your team is thinking through what this means for your own infrastructure and you'd find it useful to compare notes, we’re happy to have that conversation.
Alex Stamos, former Facebook CSO and a voice worth listening to on this: “We only have something like six months before the open-weight models catch up to the foundation models in bug finding.” That estimate specifically refers to autonomous full-codebase discovery. For targeted verification of known vulnerability patterns, the parity question is already more complicated.
Running your own infrastructure? We’d be happy to compare notes on how you’re approaching patch readiness and monitoring in the current environment.
Talk to us →No commitment. Just a conversation.