Mythos for Offensive Security: XBOW’s Evaluation

About two months ago, Anthropic invited us to help them assess the capability of a new model they thought represented a significant shift in capability. So we put it through our security gauntlet. Benchmarks, workflows, interactive use, and integrations.

Today, we can finally share details on how we tested Mythos Preview, what we found, and what it means. 

Spoilers: This model is a major advance. It is substantially better than prior models at finding vulnerability candidates, especially when source code is available. It communicates with unusual technical precision, reasons well about code, and shows strong promise in complex domains such as native-code analysis and reverse engineering. 

Our takeaway: Mythos Preview is a powerful tool for generating strong vulnerability leads and technically precise analysis. It is especially adept at analyzing source code with a security mindset. It’s not magic, though: a model is a brain without a body. While source code audits are mostly a brain activity, live site pentests like the ones XBOW performs very much need a body whose skill and control can match the brain’s power. — Read More

#cyber

Defense at AI speed: Microsoft’s new multi-model agentic security system tops leading industry benchmark

Today Microsoft announced a major step forward in AI-powered cyber defense: our new agentic security system helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack—including four Critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service. They used the new Microsoft Security multi-model agentic scanning harness (codename MDASH) which was built by Microsoft’s Autonomous Code Security team. Unlike single-model approaches, the harness orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end.

The results speak for themselves: 21 of 21 planted vulnerabilities found with zero false positives on a private test driver; 96% recall against five years of confirmed Microsoft Security Response Center (MSRC) cases in clfs.sys and 100% in tcpip.sys; and an industry-leading 88.45% score on the public CyberGym benchmark of 1,507 real-world vulnerabilities—the top score on the leaderboard, roughly five points ahead of the next entry. — Read More

#cyber

Andy Jassy Is Rewriting Amazon’s Playbook for the AI Age

Jassy was once Jeff Bezos’ deputy and the head of Amazon’s cloud computing arm. Five years into his tenure as CEO, he’s killing projects, cutting staff, pleasing Wall Street and steering the everything store through its greatest challenge yet.

… This July will mark five years since Andy Jassy took over the chief executive officer role from Amazon’s founder. At the corporate offices in Seattle, the workforce has grown accustomed to his brand of rigorous oversight and ongoing exhortations to act as if they were at Jeff Bezos’ startup, not a $2.9 trillion behemoth. He recently placed a series of staggeringly expensive bets on artificial intelligence, audacious even by the standards of Silicon Valley’s ongoing trillion-dollar AI bacchanalia. In February he agreed to invest as much as $50 billion in OpenAI in a deal that commits the rising startup to relying in part on Amazon’s data centers and custom-designed microchips. Then in April he expanded a similar partnership with its archrival, Anthropic—a $13 billion investment, with an option for an additional $20 billion. To Jassy’s critics, that spending was the price of Amazon’s late jump into the current AI wave. He wasn’t bluffing, though: Jassy spooked investors by vowing to spend $200 billion this year on big-ticket items including warehouse robots, a far-out effort to launch satellites into space, and in particular more AI data centers, AI chips and networking equipment. “I don’t think the world has ever seen a technology get this much adoption and grow this quickly, at least in my lifetime,” Jassy tells Bloomberg Businessweek. — Read More

#big7