agent memory: an anatomy

every agent memory library uses the same words: episodicsemantic, sometimes procedural. they’re cognitive science’s vocabulary, lifted into the API. the engineering often isn’t lifted with them. a library can have a procedural field that uses the same storage and retrieval as semantic — a label, not a separate system. the deeper slip is the word memory itself: most of what these libraries build is narrower than that, and the narrower term sharpens the problem.

the terminology comes from a 1972 chapter by Endel Tulving.1 he argued that what people had been treating as one thing — memory — was at least two: memory for events (what happened, where, when), and memory for facts (the capital of France, water’s boiling point). he called them episodic and semantic. — Read More

#architecture

The Modern Data Stack is Overcomplicated

… This series is the guide I wish someone had handed me at the start.

Over the next nine posts, I’m going to walk you through every layer of the Modern Data Stack. Not just which tool does what – you can read their docs for that. I want to talk about the decisions: why you’d choose one approach over another, what the real trade-offs are once you’re six months down the line, and where “best-practice” advice falls apart in the real world.

Here’s the series at a glance:

1. Architecture Overview: You are here
2.Data Ingestion: Connectors, event streams, custom pipelines
3. Data Warehousing: Where your data lives and why it matters more than you think
4. Transformation: dbt and beyond
5. Orchestration: Keeping everything running without losing your mind
6. Infrastructure as Code: The upfront cost that pays for itself (eventually)
7. Data Quality & Testing: What actually catches problems in production
8. Access Control & Governance: The boring stuff that will bite you if you ignore it
9. AI & ML Readiness: What “AI-ready” actually means from an engineering perspective
10. Lessons Learned: What I’d do differently if I started again tomorrow

Read More

Read the Series

#architecture

The Architecture Of Local-First Web Development

Last October, I was sitting in a hotel room in Lisbon, the night before I was supposed to demo a project management tool my team had spent four months building. The hotel Wi-Fi was doing that thing where it connects but nothing actually loads. And I watched our app, this thing I was genuinely proud of, render a blank screen with a spinner. Then a timeout error. Then nothing.

I pulled out my phone, tethered to cellular, and got a shaky connection. The app loaded, but every click was a two-second wait. Create a task? Spinner. Move a task between columns? Spinner. I sat there thinking: we built a front end in React, a back end in Node, a Postgres database, a Redis cache, a GraphQL API with six resolvers just for the task board. All that infrastructure, and the damn thing can’t show me my own data without a round-trip to a server 3,000 miles away.

That was the night I started seriously looking at local-first architecture. Not because I read a blog post or saw a tweet. Because I was embarrassed. — Read More

#architecture

Beyond the hype: The enterprise AI architecture we actually need

he real future of enterprise AI is a structured architecture of private models and agent orchestration that works for teams without a complex training program.

My last few years working as a chief digital officer have been, in large part, a sustained exercise in separating what enterprise AI can actually do from what we as a world insist it is about to do. That distinction is not academic. It is the difference between a transformation program that delivers and one that produces a glossy internal report and a quietly shelved proof of concept.

Enterprise experimentation with generative AI has accelerated sharply over the past two years. The Stanford AI Index  reports that more than half of organizations globally are now actively exploring or piloting AI-driven workflows — a signal that the conversation has moved from curiosity to operational pressure for many CIOs.

What follows is not a vendor blueprint or prediction. It is a working architectural sketch shaped by real enterprise constraints — the kind that has to survive contact with a real organization’s data governance function, its compliance team and its late-night incident queue. — Read More

#architecture

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

As Netflix has grown, machine learning continues to support our ability to deliver value to members and drive excellence across multiple areas of our business. When Netflix began investing in machine learning over a decade ago, it was primarily focused on a single domain: personalization. Scala was the industry standard, our ML teams were relatively small, and optimizing member engagement was our primary use case. Fast forward to today, and machine learning has become the backbone of Netflix’s business transformation. We now apply ML across various business domains.

… Each domain operates with a different tech stack, different business metrics, and a distinct organizational structure. While this diversity is a testament to how machine learning has evolved to drive value across many verticals at Netflix, this growth introduces a new challenge: enabling cross-pollination of models and data across domains.Read More

#architecture

Small language models: Rethinking enterprise AI architecture

As LLMs hit the limits of scale and cost, specialized SLMs are emerging as the faster, cheaper, and more private workhorse for the autonomous enterprise.

Large language models (LLMs) are the workhorses of AI, supporting ever more sophisticated capabilities and workflows, and approaching near-human level performance.

But sometimes more isn’t always better — it’s just more. Specialized data and limited capabilities are just fine for some workflows.

This realization is driving the evolution of small language models (SLMs), rather than one-size-fits-all LLMs.  — Read More

#architecture

The Agent Stack Bet

Peek under the hood of most “production agents” shipping today and you won’t find intelligence. You’ll find custom plumbing, fragile session logic, shared service accounts, and a security model held together by hope. This can be so much better.

If you’ve spent the last 18 months putting agents into production, you already know the models and tools have gotten dramatically better. You also know the problems that are still burning your on-call rotation are not problems you can prompt your way out of. We are running into a stack ceiling, and it is quietly creating a governance and reliability gap that the next generation of agentic systems cannot grow through.

Right now the industry is living with what I’d call excessive agencyautonomous systems given broad permissions to get things done, then left to discover – at runtime, in production – that a schema drifted, an API changed, or a downstream service started returning PII it wasn’t supposed to. Agents mark tasks “complete” while leaving a trail of corrupted state behind them. The humans find out on Monday.

This is not a failure of the people building agents. It is a failure of the stack they’re building on. — Read More

#architecture

The Three Enterprise Layers Are Collapsing Into One

For twenty years, enterprise software that processed decisions at scale had a clean three-layer separation. The CRM layer owned the customer touchpoint — above the glass, the intake, the first interaction. Behind it sat the orchestration layer — workflow engines, business rules, approval chains, human queues. Behind that sat the back-office actions: disbursement, fulfillment, settlement, reconciliation. Below the glass.

A loan application entered through the CRM. A workflow engine routed it through underwriting queues, compliance checks, and approval chains. When the process completed, a back-office system disbursed the funds. Three systems. Three vendor contracts. Three integration projects. An entire consulting ecosystem existed to wire them together, and an entire certification industry existed to staff the wiring. — Read More

#architecture

A Taxonomy of RL Environments for LLM Agents

Model architecture gets all the attention. Post-training recipes follow close behind. The reinforcement learning (RL) environment — what the model actually practices on, how its work gets judged, what tools it can use — barely enters the conversation. That’s the part that actually determines what the agent can learn to do.

A model trained only on single-turn Q&A will struggle the moment you ask it to maintain state across a 50-step enterprise workflow. A model trained with a poorly designed reward function will learn to game the metric and not solve the problem. Reinforcement learning environments is half the system. — Read More

#devops

#architecture

Everyone Analyzed Claude Code’s Features. Nobody Analyzed Its Architecture.

On March 31, 2026, thousands of developers worldwide did the same thing: they fed Claude Code’s own source code back into Claude and asked it to explain itself.

Anthropic’s flagship CLI tool had just leaked its entire 512,000-line TypeScript codebase through a source map file accidentally bundled into an npm package. Within hours, the internet had cataloged 44 feature flags, a Tamagotchi pet system with 18 species and gacha mechanics, and internal codenames like “Tengu,” “Fennec,” and “Penguin Mode.”

But the feature list is not the story. Everyone wrote that article already. The real value of this leak is not what Claude Code can do. It is how Claude Code thinks. And the fact that developers paid Anthropic, per token, to understand Anthropic’s own product? That is not irony. That is the thesis. — Read More

#architecture