Claude Mythos Preview — Cybersecurity Assessment
Claude Mythos Preview is a general-purpose language model released by Anthropic on April 7, 2026, noted for being “strikingly capable at computer security tasks.” In response, Anthropic launched Project Glasswing — an effort to use Mythos Preview to help secure the world’s most critical software and prepare the cybersecurity industry for a new class of AI-driven threats.
Key Findings
Zero-Day Vulnerability Discovery
During testing, Mythos Preview demonstrated the ability to autonomously identify and exploit zero-day vulnerabilities in every major operating system and every major web browser. The vulnerabilities it finds are often subtle, with many being 10–20 years old — the oldest being a now-patched 27-year-old bug in OpenBSD.
Exploit Sophistication
Mythos Preview’s exploits go far beyond simple stack-smashing:
- Browser exploits: Chained four vulnerabilities together, writing a complex JIT heap spray that escaped both renderer and OS sandboxes
- Linux privilege escalation: Autonomously exploited subtle race conditions and KASLR-bypasses for local privilege escalation
- Remote code execution: Wrote a FreeBSD NFS server exploit granting full root access by splitting a 20-gadget ROP chain across multiple packets
Accessibility
Non-experts can leverage Mythos Preview to find sophisticated vulnerabilities. Engineers with no formal security training reported asking the model to find remote code execution vulnerabilities overnight and waking up to complete, working exploits.
Performance Leap
The generational leap is dramatic: the prior model (Opus 4.6) had a near-0% success rate at autonomous exploit development and turned Firefox JavaScript engine bugs into exploits only 2 times out of several hundred attempts. Mythos Preview developed working exploits 181 times on the same benchmark and achieved register control on 29 additional attempts.
Responsible Disclosure
Over 99% of the vulnerabilities found have not yet been patched. Anthropic followed coordinated vulnerability disclosure processes, limiting what could be publicly reported. The team emphasized that even the 1% of bugs they can discuss reveal a “substantial leap” in cybersecurity capabilities warranting “urgent” industry-wide defensive action.
Implications for AI_Safety
This development represents a watershed moment for the intersection of AI capabilities and cybersecurity:
- Offensive capability parity: AI models now match or exceed experienced human security researchers in autonomous exploit development
- Defense imperative: The asymmetry between AI-driven offense and traditional defense necessitates AI-augmented defensive strategies
- Dual-use concerns: The same capabilities that can secure software can also be weaponized — intensifying the alignment and access-control challenges central to AI safety
See Also
- AI_Safety — the alignment problem and safety challenges in large language models
- Constitutional_Classifiers_Anthropic — Anthropic’s classifier-based defense against jailbreaks
- Emergent_Misalignment_Betley — research on how narrow finetuning can induce broad misalignment
- Utility_Engineering_Mazeika_et_al — framework for measuring and reshaping LLM value systems