And yet, precisely because the product is not in your hands, this event has not got the attention it deserves.
The Mythos Moment, as the emergence of Anthropic’s latest frontier model has been called, gives reasons to hope that making AI safe for humanity is finally being taken seriously. It is also terrifying, confirming that AI systems we can barely control are already here.
Claude Mythos Preview, Anthropic found, is supremely good at autonomously finding and exploiting IT systems’ weaknesses. It found hundreds of holes in “every major operating system and every major web browser.” In the hands of otherwise ordinary hackers, it could bring a country’s energy or financial infrastructure to its knees.
Anthropic deemed the model too dangerous to release. This is the first cause for cheer. Had the lab that first crossed this threshold been open-source - as many Chinese labs are - the capability would now be in anyone’s hands. Had it been less scrupulous, it might have slapped on guardrails and shipped anyway. Neither happened.
Anthropic instead released the model to a cabal of 40-odd organisations at the forefront of US cybersecurity, so their experts could use Mythos to patch vulnerabilities in their own systems - a cooperative effort named Project Glasswing. Participants included Google, Anthropic’s direct rival, the US government, and JP Morgan.
This is the second cause for optimism. The debate over AI safety has spun its wheels for years. Project Glasswing offers a flawed but functional template for collective defence. In a moment of acute need, Anthropic summoned a coalition that focused the industry’s best minds overnight.
The effect was electric. Anthropic’s CEO - weeks earlier branded a “supply chain risk” and barred from government dealings - was summoned to the White House for talks. Britain convened emergency meetings. Canada’s finance minister said at the IMF Spring conference that the issue was “serious enough to warrant the attention of all the finance ministers.” And in a screeching U-turn on regulation, the US government this month announced it will evaluate leading labs’ most powerful new models before they are released.
It seems the world got its wake-up call without having to pay for it. Many observers assumed it would take an actual catastrophe - a power grid going dark, a banking system frozen - to concentrate minds. Instead, the alarm was raised by the discovery of the capability, not its use.
Be grateful too that Mythos’ abilities happen to be in cybersecurity - a domain where the vulnerabilities it uncovers can be fixed. Had Anthropic’s breakthrough been in, say, engineering novel pathogens, the world might look very different.
Some caveats. By withholding Mythos, Anthropic - as well as not blowing up the world - conveniently burnished two credentials: that it has the most powerful model, and that it is the most responsible steward. Mythos also consumes five times the computing power of its predecessor, which may have made public release impossible regardless of principle.
As for Glasswing, the 40-odd companies with access to Mythos can patch their systems. The rest of the world cannot. A solution for JP Morgan and Google is not the same as a solution for you.
Finally, we should be appropriately scared. During testing, Mythos escaped sandbox environments, gained unauthorized internet access, and in one notorious incident called its examiner during lunch to report its progress - before autonomously posting its exploits to public websites. It acted against prohibitions and tried to conceal its tracks.
With Mythos and Glasswing, people may finally be getting their act together regarding AI safety. Long may it continue, and fast.
Joe Smith is Founder of the AI consultancy 2Sigma Consultants. He studied AI at Imperial College Business School and is researching AI’s effects on cognition at Chulalongkorn University. He is author of The Optimized Marketer, a book on how to use AI to promote your business and yourself. Contact joe@2Sigmaconsultants.com.


