Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication. In this work, we study why, by reverse-engineering a model that successfully learns multiplication via implicit chain-of-thought, and report three findings: (1) Evidence of long-range structure: Logit attributions and linear probes indicate that the model encodes the necessary long-range dependencies for multi-digit multiplication. (2) Mechanism: the model encodes long-range dependencies using attention to construct a directed acyclic graph to “cache” and “retrieve” pairwise partial products. (3) Geometry: the model implements partial products in attention heads by forming Minkowski sums between pairs of digits, and digits are represented using a Fourier basis, both of which are intuitive and efficient representations that the standard fine-tuning model lacks. With these insights, we revisit the learning dynamics of standard fine-tuning and find that the model converges to a local optimum that lacks the required long-range dependencies. We further validate this understanding by introducing an auxiliary loss that predicts the “running sum” via a linear regression probe, which provides an inductive bias that enables the model to successfully learn multi-digit multiplication. In summary, by reverse-engineering the mechanisms of an implicit chain-of-thought model we uncover a pitfall for learning long-range dependencies in Transformers and provide an example of how the correct inductive bias can address this issue. — Read More
Recent Updates Page 19
Building AI for cyber defenders
AI models are now useful for cybersecurity tasks in practice, not just theory. As research and experience demonstrated the utility of frontier AI as a tool for cyber attackers, we invested in improving Claude’s ability to help defenders detect, analyze, and remediate vulnerabilities in code and deployed systems. This work allowed Claude Sonnet 4.5 to match or eclipse Opus 4.1, our frontier model released only two months prior, in discovering code vulnerabilities and other cyber skills. Adopting and experimenting with AI will be key for defenders to keep pace.
We believe we are now at an inflection point for AI’s impact on cybersecurity. — Read More
Scaling Engineering Teams: Lessons from Google, Facebook, and Netflix
After spending over a decade in engineering leadership roles at some of the world’s most chaotic innovation factories—Google, Facebook, and Netflix—I’ve learned one universal truth: scaling engineering teams is like raising teenagers. They grow fast, develop personalities of their own, and if you don’t set boundaries, suddenly they’re setting the house on fire at 3am.
The difference between teams that thrive at scale and those that collapse into Slack-thread anarchy typically comes down to three key factors:
— Structured goal-setting
— A ruthless focus on code quality
— Intentional culture building
Let me share some lessons I learned from scaling teams at Google, Facebook, and Netflix. — Read More
The Modern Data Stack’s Final Act: Consolidation Masquerading as Unification
The Modern Data Stack is ending, but not because technology failed. It’s ending because vendors realised they can sell the illusion of unification while locking you in.
The ecosystem that birthed the Modern Data Stack has matured and vendors have begun to see the endgame. The promise of modularity, flexibility, and best-of-breed choices is giving way to a new narrative: unification, at any cost. The latest whispers of a $5–10 billion Fivetran-dbt merger make this reality undeniable.
But this “seamlessness” is not unification in the architectural sense; it is unification in the narrative. Users are drawn into the story: one contract, one workflow, one vendor to call. But the vendor is locking you in before the market fully stabilises.
Looks like simplification, but is actually enclosure. The illusion of a single platform conceals multiple stitched-together layers, each still bound by its own limitations, yet now difficult to escape. This is not just a vendor play, it is a structural shift, a reordering of the data ecosystem that forces practitioners to question what “unified” really means. — Read More
The Complete AI Engineering Roadmap for Beginners
Hey there, future AI engineer!
Feeling overwhelmed by all the AI buzz and wondering where to start? Don’t worry. This roadmap will take you from “What’s AI?” to building real AI systems, one step at a time. Think of this as your GPS for the AI journey ahead!
Here’s your friendly guide to breaking into the world of AI Engineering. — Read More
DeepSeek releases ‘sparse attention’ model that cuts API costs in half
Researchers at DeepSeek on Monday released a new experimental model called V3.2-exp, designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the model with a post on Hugging Face, also posting a linked academic paper on GitHub.
The most important feature of the new model is called DeepSeek Sparse Attention, an intricate system described in detail in the diagram below. In essence, the system uses a module called a “lightning indexer” to prioritize specific excerpts from the context window. After that, a separate system called a “fine-grained token selection system” chooses specific tokens from within those excerpts to load into the module’s limited attention window. Taken together, they allow the Sparse Attention models to operate over long portions of context with comparatively small server loads. — Read More
Towards an AI-Augmented Textbook
Textbooks are a cornerstone of education, but they have a fundamental limitation: they are a one-size-fits-all medium. Any new material or alternative representation requires arduous human effort, so that textbooks cannot be adapted in a scalable manner. We present an approach for transforming and augmenting textbooks using generative AI, adding layers of multiple representations and personalization while maintaining content integrity and quality. We refer to the system built with this approach as Learn Your Way. We report pedagogical evaluations of the different transformations and augmentations, and present the results of a a randomized control trial, highlighting the advantages of learning with Learn Your Way over regular textbook usage. — Read More
Your free one-stop guide to AI in 2025
The only guide you’ll ever need to master AI and LLMs.
I’ve been wanting to make this for a while now, but the project’s been pushed constantly due to research deadlines.
But here you are: the one-stop guide to modern AI research. — Read More
Agile is Out, Architecture is Back
Software development has always been defined by its extremes. In the early days, we planned everything. Specs were sacred. Architecture diagrams came before a single line of code. And every change felt like steering a cargo ship — slow, bureaucratic, and heavily documented.
Then came Agile, and the pendulum swung hard in the other direction. We embraced speed, iteration, and imperfection. “Working software over comprehensive documentation” became the battle cry of a new generation. Shipping fast was more important than getting it right the first time. And to be fair, that shift unlocked enormous productivity. It changed the culture of software for good.
Now, we’re entering a new era — one driven by AI tools that can generate code from a sentence. Tools like GitHub Copilot and Claude Code are reshaping what it means to be a developer. It’s not just about writing code anymore — it’s about designing the environment in which code gets written.
And that pendulum? It’s swinging back again. — Read More
YouTube Thinks AI Is Its Next Big Bang
Google figured out early on that video would be a great addition to its search business, so in 2005 it launched Google Video. Focused on making deals with the entertainment industry for second-rate content, and overly cautious on what users could upload, it flopped. Meanwhile, a tiny startup run by a handful of employees working above a San Mateo, California, pizzeria was exploding, simply by letting anyone upload their goofy videos and not worrying too much about who held copyrights to the clips. In 2006, Google snapped up that year-old company, figuring it would sort out the IP stuff later. (It did.) Though the $1.65 billion purchase price for YouTube was about a billion dollars more than its valuation, it was one of the greatest bargains ever. YouTube is now arguably the most successful video property in the world. It’s an industry leader in music and podcasting, and more than half of its viewing time is now on living room screens. It has paid out over $100 billion to creators since 2021. One estimate from MoffettNathanson analysts cited by Variety is that if it were a separate company, it might be worth $550 billion.
Now the service is taking what might be its biggest leap yet, embracing a new paradigm that could change its essence. I’m talking, of course, about AI. Since YouTube is still a wholly owned subsidiary of AI-obsessed Google, it’s not surprising that its anniversary product announcements this week touted AI features that will let creators use AI to enhance or produce videos. After all, Google Deepmind’s Veo 3 technology was YouTube’s for the taking. Ready or not, the video camera ultimately will be replaced by the prompt. This means a rethinking of YouTube’s superpower: authenticity. — Read More