I asked my AI agent how it wants to remember things. It redesigned its own memory system, ran a self-eval, diagnosed its blindspots, and improved recall from 60% to 93% — for two dollars. The interesting part isn’t the benchmark. It’s what happens when you treat an AI as a participant in its own cognitive architecture.
I’ve been running ten AI agents for about six weeks. They have names, scopes, daily standups, escalation paths. They file issues, draft newsletters, monitor production services. They remember things. Or they’re supposed to.
The memory system works like this: a markdown file tree (memory/YYYY-MM-DD.md) gets indexed into a SQLite database with Gemini embeddings. 18,000 chunks across 604 files and 6,578 session transcripts. 3.6 gigabytes. Every 29 minutes, a “scout” cron job reads recent sessions and promotes important details to disk. When an agent needs to recall something, it searches the index and gets back ranked snippets.
I had no idea if any of this actually worked. — Read More
Tag Archives: Architecture
The Architecture Behind Open-Source LLMs
In December 2024, DeepSeek released V3 with the claim that they had trained a frontier-class model for $5.576 million. They used an attention mechanism called Multi-Head Latent Attention that slashed memory usage. An expert routing strategy avoided the usual performance penalty. Aggressive FP8 training cuts costs further.
Within months, Moonshot AI’s Kimi K2 team openly adopted DeepSeek’s architecture as their starting point, scaled it to a trillion parameters, invented a new optimizer to solve a training stability challenge that emerged at that scale, and competed with it across major benchmarks.
Then, in February 2026, Zhipu AI’s GLM-5 integrated DeepSeek’s sparse attention mechanism into their own design while contributing a novel reinforcement learning framework.
This is how the open-weight ecosystem actually works: teams build on each other’s innovations in public, and the pace of progress compounds. To understand why, you need to look at the architecture. — Read More
The February Reset: Three Labs, Four Models, and the End of “One Best AI”
February 5th, 2026. Anthropic ships Claude Opus 4.6. Same day, OpenAI drops GPT-5.3-Codex. Twelve days later, Anthropic follows with Sonnet 4.6. Two days after that, Google fires back with Gemini 3.1 Pro.
Four frontier models. Three labs. Fourteen days.
When the dust settled, something genuinely new had happened: no single model won. Not on benchmarks. Not on user preference. Not on price. Not on coding. For the first time in the frontier AI race, the leaderboard fractured into distinct lanes, and the “which model is best?” question stopped having a coherent answer.
This article maps who won what, where each model fails, and how the February shakeup changes the way you should think about your model stack. No cheerleading for any provider. Just the numbers and the trade-offs. — Read More
5 AI Architecture Decisions That Will Define Your Career in the Next 3 Years
AI didn’t suddenly make data and AI architecture harder.
It made it legible.
In 2026, systems are easier to inspect, reason about, and question than ever before. AI copilots, automated reviews, and architectural analysis tools don’t just help teams move faster—they surface decisions that were previously buried under complexity.
That shift quietly changed what defines a successful career. — Read More
Agent-native Architectures
Software agents work reliably now. Claude Code demonstrated that a large language model (LLM) with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.
The surprising discovery: A really good coding agent is actually a really good general-purpose agent. The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.
The Claude Code software development kit (SDK) makes this accessible. You can build applications where features aren’t code you write—they’re outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.
This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding. — Read More
The Transactional Graph-Enhanced LLM: A Definitive Guide to Read/Write Chatbots for Relational Data
The integration of Large Language Models (LLMs) with enterprise relational databases has been largely confined to read-only Retrieval-Augmented Generation (RAG) systems. This paper transcends that limitation, presenting a comprehensive architectural framework for building conversational AI agents capable of both reading and writing to a relational database via a Knowledge Graph (KG) intermediary. We will dissect the core architectural challenge, evaluate multiple design patterns — including KG as a cache, KG as a source of truth, and a sophisticated Command Query Responsibility Segregation (CQRS) pattern. This document provides an exhaustive, production-ready guide, complete with data modeling strategies, detailed prompt engineering for both query and command generation, Mermaid architecture diagrams, and best practices for security, validation, and transaction management. This is the blueprint for creating the next generation of truly interactive, data-manipulating chatbots. — Read More
Beyond Standard LLMs
From DeepSeek R1 to MiniMax-M2, the largest and most capable open-weight LLMs today remain autoregressive decoder-style transformers, which are built on flavors of the original multi-head attention mechanism.
However, we have also seen alternatives to standard LLMs popping up in recent years, from text diffusion models to the most recent linear attention hybrid architectures. Some of them are geared towards better efficiency, and others, like code world models, aim to improve modeling performance.
After I shared my Big LLM Architecture Comparison a few months ago, which focused on the main transformer-based LLMs, I received a lot of questions with respect to what I think about alternative approaches. (I also recently gave a short talk about that at the PyTorch Conference 2025, where I also promised attendees to follow up with a write-up of these alternative approaches). So here it is! — Read More
Architectural debt is not just technical debt
When I was a developer, half of our frustrations were about technical debt (the other were about estimates that are seen as deadlines).
We always made a distinction between code debt and architecture debt: code debt being the temporary hacks you put in place to reach a deadline and never remove, and architectural debt being the structural decisions that come back to bite you six months later.
While I agree that implementing software patterns like the strangler pattern or moving away from singletons is definitely software architecture. Architectural debt goes way beyond what you find in the code. — Read More
Emerging Architectures for Modern Data Infrastructure
The growth of the data infrastructure industry has continued unabated since we published a set of reference architectures in late 2020. Nearly all key industry metrics hit record highs during the past year, and new product categories appeared faster than most data teams could reasonably keep track. Even the benchmarkwars and billboard battles returned.
To help data teams stay on top of the changes happening in the industry, we’re publishing in this post an updated set of data infrastructure architectures. They show the current best-in-class stack across both analytic and operational systems, as gathered from numerous operators we spoke with over the last year. Each architectural blueprint includes a summary of what’s changed since the prior version.
We’ll also attempt to explain why these changes are taking place. We argue that core data processing systems have remained relatively stable over the past year, while supporting tools and applications have proliferated rapidly. We explore the hypothesis that platforms are beginning to emerge in the data ecosystem, and that this helps explain the particular patterns we’re seeing in the evolution of the data stack. — Read More
Stanford RNA 3D Folding: 1st Place Solution
My approach was clear from the outset. Without GPUs, training a model from scratch or fine-tuning was not viable. My early research – drawing on CASP results, literature, and conference talks, including one by host @rhijudas – showed that Template-Based Modeling approaches consistently dominated. Based on this, I committed to TBM from day one and spent the next 90 days refining my method.
Next, I focused on the evaluation metric, since understanding it determines the exploration path. TM-score has two key properties: it is normalized by structure length (so 50nt and 200nt RNAs are compared on the same 0-1 scale), and it is robust to local errors – a small number of misplaced nucleotides does not disproportionately lower the score. This insight allowed me to prioritize getting the overall fold correct over achieving atomic-level precision. — Read More