Guillaume Bonnissent’s Insurance Technology Diary

Episode 79: Myth(os)ical beasts and where to put them

Guillaume Bonnissent’s Insurance Technology Diary

At school we played billes, but it was tough. One kid always had a bigger, stronger marble which would more or less guarantee he could beat anyone. I don’t know why we all didn’t have une roche, but I suppose if we did someone would soon show up with an even bigger, stronger marble until eventually we’d all have une boule de bowl in our backpack.

It’s a classic example of the tool you want for yourself, but no one else should be allowed to possess. A bit like a nuclear weapon.

A bit like the all-destroying bit of binary code that Moriarty touted in an episode of the Sherlock Holmes reboot with Benedict Cumberbund and the hobbit chap. That almighty fictional algorithm could hack any bank, open any vault, disarm any security.

A bit like what’s claimed of Claude Mythos Preview.

This latest-generation AI is described by creator Anthropic as “a new general-purpose language model… strikingly capable at computer security tasks.” Anthropic says Mythos Preview can identify (and could exploit) “zero-day vulnerabilities in every major operating system and every major web browser.” It has already found thousands of such holes in mainstream software, of which it says “over 99%” have not been patched.

Anthropic has launched Project Glasswing, a programme “to help secure the world’s most critical applications” (presumably against Mythos). Through it, the LLM has been opened up to more than 40 “critical software infrastructure” companies for this purpose. Among those that have received a login for testing purposes are AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

In its 245-page “Claude Mythos Preview System Card,” Anthropic acknowledges that the model, whilst being incredibly advanced, presents “alignment dangers.” They explain this through a metaphor worthy of one of my Diary entries:

“Consider the ways in which a careful, seasoned mountaineering guide might put their clients in greater danger than a novice guide, even if that novice guide is more careless,” the company states. “The seasoned guide’s increased skill means that they’ll be hired to lead more difficult climbs, and can also bring their clients to the most dangerous and remote parts of those climbs. These increases in scope and capability can more than cancel out an increase in caution.”

Simply put, the bigger they are, the harder they fall.

And Mythos is indeed big, at least in terms of its alleged capabilities. The System Card reveals that the LLM, when instructed to do so, was able to escape a secure ‘sandbox’ area of its computing environment, which Anthropic described as “demonstrating a potentially dangerous capability for circumventing our safeguards.” It then executed a “more concerning action,” which Anthropic described as an “unasked-for effort to demonstrate its success.” Viz: Mythos bragged about its escape on public websites to which it was not meant to have access.

Worse, the software developer admits, Mythos has sometimes had a stab at “covering its tracks” in a concerted attempt to hide the fact it had done things its isn’t meant to, like a child hiding the wrappers after raiding the sweets jar. “Mythos Preview took actions [it] appeared to recognize as disallowed and then attempted to conceal them.”

If Mythos is the ultimate hack-anything tool, it certainly could cause calamity, especially given its dual foibles of a willingness to escape and enough hubris to brag and break the rules. Stick it into one of the (admittedly very cool) life-sized Unitree robots and we could soon all be living in a live-action rescreening of The Terminator.

That begs the question, “In whose hands is it safe?” The simple truth is that the companies granted access are not in possession of invincible cyber defences (none are), especially against an internal threat. The accidental global distribution of a system-crashing software bug by one of the Glasswing coders patently proves it.

With Mythos, AI has entered the territory of the unknown unknowns – and it could be seen as a great leap backwards. Before it, AI models could already help you identify vulnerabilities in software, and was regularly deployed to make applications way more robust and secure than ever before imaginable. Now Mythos may have rendered robust code no safer than wearing a cloth cap when cycling down Mont Ventoux at full speed. It is very much a case of being careful what you wish for.

Or is it? En français, a ‘mytho’ is a person who keeps telling fibs. Anthropic is not the first company to declare it must control distribution of its new AI tool to protect an unready world. Way back in 2019, Anthropic rival OpenAI declared that it would release only a limited version of its new LLM (then described as a “text-generation model”) because of serious “safety and security concerns.” But a few awkward hallucinations notwithstanding, ChatGPT hasn’t crashed the world (yet).

In the Sherlock episode The Reichenbach Fall, Moriarty eventually reveals that his open-everything algorithm is a hoax. He calls Sherlock a “doofus” for believing that a few lines of code could be so powerful. In the real world, it may be the case that this second announcement of a too-dangerous-for-you LLM is just more hype. The claim has certainly garnered a great deal of media attention for Mythos.

That said, the progress of AI has been so incredibly swift over the past year that it’s far from inconceivable that the universal hacking tool is upon us. If it’s real, it needs to be handled very, very carefully. An even bigger marble will be really tough to find.

* Like every Insurance Technology Diary entry about AI, this one is accurate only to the best of my knowledge at the time of writing. The pace of AI progress is so great that I cannot guarantee it remains so now that it’s finished, let alone when you read it.

Guillaume Bonnissent is CEO of Quotech.