The State of AI Report is the most widely read and trusted analysis of key developments in AI. Published annually since 2018, the open-access report aims to spark informed conversation about the state of AI and what it means for the future. Produced by AI investor Nathan Benaich and Air Street Capital.
If 2024 was the year of consolidation, 2025 was the year reasoning got real. What began as a handful of “thinking” models has turned into a global competition to make machines that can plan, verify, and reflect. OpenAI, Google, Anthropic, and DeepSeek all released systems capable of reasoning through complex tasks, sparking one of the fastest research cycles the field has ever seen.
AI [now] acts as a force multiplier for technological progress in our increasingly digital, data-driven world. This is because everything around us, from culture to consumer products, is ultimately a product of intelligence. — Read More
Recent Updates Page 44
Building a Resilient Event Publisher with Dual Failure Capture
When we set out to rebuild Klaviyo’s event infrastructure, our goal wasn’t just to handle more scale, it was to make the system rock solid. In Part 1 of this series, we shared how we migrated from RabbitMQ to a Kafka-based architecture to process 170,000 events per second at peak without losing data. In Part 2, we dived into how we made event consumers resilient.
This post, Part 3, is all about the Event Publisher, the entry point into our event pipeline. The publisher has an important job: It needs to accept events from hundreds of thousands of concurrent clients, serialize them, keep up with unpredictable traffic spikes, and most importantly, ensure that no event is ever lost. If the publisher isn’t resilient, the rest of the pipeline can’t rely on a steady and complete flow of events. — Read More
Introducing CodeMender: an AI agent for code security
… Software vulnerabilities are notoriously difficult and time-consuming for developers to find and fix, even with traditional, automated methods like fuzzing. Our AI-based efforts like Big Sleep and OSS-Fuzz have demonstrated AI’s ability to find new zero-day vulnerabilities in well-tested software. As we achieve more breakthroughs in AI-powered vulnerability discovery, it will become increasingly difficult for humans alone to keep up.
CodeMender helps solve this problem by taking a comprehensive approach to code security that’s both reactive, instantly patching new vulnerabilities, and proactive, rewriting and securing existing code and eliminating entire classes of vulnerabilities in the process. Over the past six months that we’ve been building CodeMender, we have already upstreamed 72 security fixes to open source projects, including some as large as 4.5 million lines of code.
By automatically creating and applying high-quality security patches, CodeMender’s AI-powered agent helps developers and maintainers focus on what they do best — building good software. — Read More
OpenAI’s Windows Play
… OpenAI is making a play to be the Windows of AI.
For nearly two decades smartphones, and in particular iOS, have been the touchstones in terms of discussing platforms. It’s important to note, however, that while Apple’s strategy of integrating hardware and software was immensely profitable, it entailed leaving the door open for a competing platform to emerge. The challenge of being a hardware company is that by virtue of needing to actually create devices you can’t serve everyone; Apple in particular didn’t have the capacity or desire to go downmarket, which created the opportunity for Android to not only establish a competing platform but to actually significantly exceed iOS in market share.
That means that if we want a historical analogy for total platform dominance — which increasingly appears to be OpenAI’s goal — we have to go back further to the PC era and Windows. — Read More
Why Can’t Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication. In this work, we study why, by reverse-engineering a model that successfully learns multiplication via implicit chain-of-thought, and report three findings: (1) Evidence of long-range structure: Logit attributions and linear probes indicate that the model encodes the necessary long-range dependencies for multi-digit multiplication. (2) Mechanism: the model encodes long-range dependencies using attention to construct a directed acyclic graph to “cache” and “retrieve” pairwise partial products. (3) Geometry: the model implements partial products in attention heads by forming Minkowski sums between pairs of digits, and digits are represented using a Fourier basis, both of which are intuitive and efficient representations that the standard fine-tuning model lacks. With these insights, we revisit the learning dynamics of standard fine-tuning and find that the model converges to a local optimum that lacks the required long-range dependencies. We further validate this understanding by introducing an auxiliary loss that predicts the “running sum” via a linear regression probe, which provides an inductive bias that enables the model to successfully learn multi-digit multiplication. In summary, by reverse-engineering the mechanisms of an implicit chain-of-thought model we uncover a pitfall for learning long-range dependencies in Transformers and provide an example of how the correct inductive bias can address this issue. — Read More
Building AI for cyber defenders
AI models are now useful for cybersecurity tasks in practice, not just theory. As research and experience demonstrated the utility of frontier AI as a tool for cyber attackers, we invested in improving Claude’s ability to help defenders detect, analyze, and remediate vulnerabilities in code and deployed systems. This work allowed Claude Sonnet 4.5 to match or eclipse Opus 4.1, our frontier model released only two months prior, in discovering code vulnerabilities and other cyber skills. Adopting and experimenting with AI will be key for defenders to keep pace.
We believe we are now at an inflection point for AI’s impact on cybersecurity. — Read More
Scaling Engineering Teams: Lessons from Google, Facebook, and Netflix
After spending over a decade in engineering leadership roles at some of the world’s most chaotic innovation factories—Google, Facebook, and Netflix—I’ve learned one universal truth: scaling engineering teams is like raising teenagers. They grow fast, develop personalities of their own, and if you don’t set boundaries, suddenly they’re setting the house on fire at 3am.
The difference between teams that thrive at scale and those that collapse into Slack-thread anarchy typically comes down to three key factors:
— Structured goal-setting
— A ruthless focus on code quality
— Intentional culture building
Let me share some lessons I learned from scaling teams at Google, Facebook, and Netflix. — Read More
The Modern Data Stack’s Final Act: Consolidation Masquerading as Unification
The Modern Data Stack is ending, but not because technology failed. It’s ending because vendors realised they can sell the illusion of unification while locking you in.
The ecosystem that birthed the Modern Data Stack has matured and vendors have begun to see the endgame. The promise of modularity, flexibility, and best-of-breed choices is giving way to a new narrative: unification, at any cost. The latest whispers of a $5–10 billion Fivetran-dbt merger make this reality undeniable.
But this “seamlessness” is not unification in the architectural sense; it is unification in the narrative. Users are drawn into the story: one contract, one workflow, one vendor to call. But the vendor is locking you in before the market fully stabilises.
Looks like simplification, but is actually enclosure. The illusion of a single platform conceals multiple stitched-together layers, each still bound by its own limitations, yet now difficult to escape. This is not just a vendor play, it is a structural shift, a reordering of the data ecosystem that forces practitioners to question what “unified” really means. — Read More
The Complete AI Engineering Roadmap for Beginners
Hey there, future AI engineer!
Feeling overwhelmed by all the AI buzz and wondering where to start? Don’t worry. This roadmap will take you from “What’s AI?” to building real AI systems, one step at a time. Think of this as your GPS for the AI journey ahead!
Here’s your friendly guide to breaking into the world of AI Engineering. — Read More
DeepSeek releases ‘sparse attention’ model that cuts API costs in half
Researchers at DeepSeek on Monday released a new experimental model called V3.2-exp, designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the model with a post on Hugging Face, also posting a linked academic paper on GitHub.
The most important feature of the new model is called DeepSeek Sparse Attention, an intricate system described in detail in the diagram below. In essence, the system uses a module called a “lightning indexer” to prioritize specific excerpts from the context window. After that, a separate system called a “fine-grained token selection system” chooses specific tokens from within those excerpts to load into the module’s limited attention window. Taken together, they allow the Sparse Attention models to operate over long portions of context with comparatively small server loads. — Read More