… “prompting has split into four skills” — Context, Intent, Specification, Prompt. Everyone matched a tension one of us had brought into the room. And once they had names, something else clicked: the four crafts mapped cleanly onto P-CAM — Perception, Cognition, Agency, Manifestation.
…For the last eight months, the argument has been spec versus vibe. Structure versus flow. Waterfall versus emergence.
…Every standard critique of SDD, and every standard critique of vibe, traces back to the same thing. Not two sets of failures. One failure, surfacing on both sides of the debate. The three-layer collapse.
…Vibe coding collapsed because it had no contract. Spec-driven development is collapsing because it has three contracts pretending to be one. What rises from the fusion isn’t a new brand. It isn’t a better tool. It’s a separation of concerns — the oldest principle in software engineering — applied one layer up, to the documents we use to instruct the machines that write the documents. — Read More
Recent Updates Page 11
If You Had To Read Only 5 AI Papers, This Should Be It.
The five papers that shape how every working AI engineer in 2026 thinks — what each one actually said, why it still matters, and what to read once you’ve read it.
… Five papers and one essay. Read them in this order, and the rest of the field becomes legible. — Read More
Beyond the Coding Assistant: A Series on AI-Assisted Software Engineering
This is the first article of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available here.
The last few years of AI-assisted development have been remarkable. Coding assistants have crossed real quality bars. Engineers can now produce working code, in unfamiliar languages, against unfamiliar systems, at speeds that would have looked like science fiction in 2022. There are real productivity gains, real new affordances, and a real shift in what an individual developer can do in an afternoon.
And yet — when the conversation turns to the team and the organization — the picture is more complicated. The dramatic gains many leaders were promised haven’t shown up on every team. Some teams ship more. Some teams ship the same. Some teams have actually gotten slower, with the AI helping at the keystroke while the wider delivery metrics regress.
That gap, between what’s possible at the keystroke and what’s actually showing up in delivery, is what this series is about. The question I want to ask, and try to answer over the next several articles, is simple: what has changed, and what changes could take us so much farther than where current AI coding assistants have brought us? — Read More
The Modern Data Stack is Overcomplicated
… This series is the guide I wish someone had handed me at the start.
Over the next nine posts, I’m going to walk you through every layer of the Modern Data Stack. Not just which tool does what – you can read their docs for that. I want to talk about the decisions: why you’d choose one approach over another, what the real trade-offs are once you’re six months down the line, and where “best-practice” advice falls apart in the real world.
Here’s the series at a glance:
1. Architecture Overview: You are here
2.Data Ingestion: Connectors, event streams, custom pipelines
3. Data Warehousing: Where your data lives and why it matters more than you think
4. Transformation: dbt and beyond
5. Orchestration: Keeping everything running without losing your mind
6. Infrastructure as Code: The upfront cost that pays for itself (eventually)
7. Data Quality & Testing: What actually catches problems in production
8. Access Control & Governance: The boring stuff that will bite you if you ignore it
9. AI & ML Readiness: What “AI-ready” actually means from an engineering perspective
10. Lessons Learned: What I’d do differently if I started again tomorrow
— Read More
— Read the Series
How Claude Code works in large codebases: Best practices and where to start
Claude Code is running in production across multi-million-line monorepos, decades-old legacy systems, distributed architectures spanning dozens of repositories, and at organizations with thousands of developers. These environments present challenges that smaller, simpler codebases don’t, whether that’s build commands that differ across every subdirectory or legacy code spread across folders with no shared root.
This article covers the patterns we’ve observed that have led to successful adoption of Claude Code at scale. We use “large codebase” to refer to a wide range of deployments: monorepos with millions of lines, legacy systems built over decades, dozens of microservices across separate repositories, or any combination of the above. That also includes codebases running on languages that teams don’t always associate with AI coding tools, such as C, C++, C#, Java, PHP. (Claude Code performs better than most teams expect it to in those cases, particularly as of recent model releases.) While every large codebase deployment is shaped by its specific version control, team structure, and accumulated conventions, the patterns here generalize across them and are a good starting point for teams considering adopting Claude Code. — Read More
Mythos for Offensive Security: XBOW’s Evaluation
About two months ago, Anthropic invited us to help them assess the capability of a new model they thought represented a significant shift in capability. So we put it through our security gauntlet. Benchmarks, workflows, interactive use, and integrations.
Today, we can finally share details on how we tested Mythos Preview, what we found, and what it means.
Spoilers: This model is a major advance. It is substantially better than prior models at finding vulnerability candidates, especially when source code is available. It communicates with unusual technical precision, reasons well about code, and shows strong promise in complex domains such as native-code analysis and reverse engineering.
Our takeaway: Mythos Preview is a powerful tool for generating strong vulnerability leads and technically precise analysis. It is especially adept at analyzing source code with a security mindset. It’s not magic, though: a model is a brain without a body. While source code audits are mostly a brain activity, live site pentests like the ones XBOW performs very much need a body whose skill and control can match the brain’s power. — Read More
Defense at AI speed: Microsoft’s new multi-model agentic security system tops leading industry benchmark
Today Microsoft announced a major step forward in AI-powered cyber defense: our new agentic security system helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack—including four Critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service. They used the new Microsoft Security multi-model agentic scanning harness (codename MDASH) which was built by Microsoft’s Autonomous Code Security team. Unlike single-model approaches, the harness orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end.
The results speak for themselves: 21 of 21 planted vulnerabilities found with zero false positives on a private test driver; 96% recall against five years of confirmed Microsoft Security Response Center (MSRC) cases in clfs.sys and 100% in tcpip.sys; and an industry-leading 88.45% score on the public CyberGym benchmark of 1,507 real-world vulnerabilities—the top score on the leaderboard, roughly five points ahead of the next entry. — Read More
Andy Jassy Is Rewriting Amazon’s Playbook for the AI Age
Jassy was once Jeff Bezos’ deputy and the head of Amazon’s cloud computing arm. Five years into his tenure as CEO, he’s killing projects, cutting staff, pleasing Wall Street and steering the everything store through its greatest challenge yet.
… This July will mark five years since Andy Jassy took over the chief executive officer role from Amazon’s founder. At the corporate offices in Seattle, the workforce has grown accustomed to his brand of rigorous oversight and ongoing exhortations to act as if they were at Jeff Bezos’ startup, not a $2.9 trillion behemoth. He recently placed a series of staggeringly expensive bets on artificial intelligence, audacious even by the standards of Silicon Valley’s ongoing trillion-dollar AI bacchanalia. In February he agreed to invest as much as $50 billion in OpenAI in a deal that commits the rising startup to relying in part on Amazon’s data centers and custom-designed microchips. Then in April he expanded a similar partnership with its archrival, Anthropic—a $13 billion investment, with an option for an additional $20 billion. To Jassy’s critics, that spending was the price of Amazon’s late jump into the current AI wave. He wasn’t bluffing, though: Jassy spooked investors by vowing to spend $200 billion this year on big-ticket items including warehouse robots, a far-out effort to launch satellites into space, and in particular more AI data centers, AI chips and networking equipment. “I don’t think the world has ever seen a technology get this much adoption and grow this quickly, at least in my lifetime,” Jassy tells Bloomberg Businessweek. — Read More
Small Language Models are the Future of Agentic AI
Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI systems is, however, ushering in a mass of applications in which language models perform a small number of specialized tasks repetitively and with little variation.
Here we lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI. Our argumentation is grounded in the current level of capabilities exhibited by SLMs, the common architectures of agentic systems, and the economy of LM deployment. We further argue that in situations where general-purpose conversational abilities are essential, heterogeneous agentic systems (i.e., agents invoking multiple different models) are the natural choice. We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm. — Read More
Interaction Models: A Scalable Approach to Human-AI Collaboration
Today, we’re announcing a research preview of interaction models: models that handle interaction natively rather than through external scaffolding. We think interactivity should scale alongside intelligence; the way we work with AI should not be treated as an afterthought. Interaction models let people collaborate with AI the way we naturally collaborate with each other—they continuously take in audio, video, and text, and think, respond, and act in real time.
We train an interaction model from scratch. To ensure real-time responsiveness, we adopt a multi-stream, micro-turn design. Our research preview demonstrates qualitatively new interaction capabilities, as well as state-of-the-art combined performance in intelligence and responsiveness. — Read More