The Architecture Behind Open-Source LLMs

In December 2024, DeepSeek released V3 with the claim that they had trained a frontier-class model for $5.576 million. They used an attention mechanism called Multi-Head Latent Attention that slashed memory usage. An expert routing strategy avoided the usual performance penalty. Aggressive FP8 training cuts costs further.

Within months, Moonshot AI’s Kimi K2 team openly adopted DeepSeek’s architecture as their starting point, scaled it to a trillion parameters, invented a new optimizer to solve a training stability challenge that emerged at that scale, and competed with it across major benchmarks.

Then, in February 2026, Zhipu AI’s GLM-5 integrated DeepSeek’s sparse attention mechanism into their own design while contributing a novel reinforcement learning framework.

This is how the open-weight ecosystem actually works: teams build on each other’s innovations in public, and the pace of progress compounds. To understand why, you need to look at the architecture. — Read More

#architecture

The February Reset: Three Labs, Four Models, and the End of “One Best AI”

February 5th, 2026. Anthropic ships Claude Opus 4.6. Same day, OpenAI drops GPT-5.3-Codex. Twelve days later, Anthropic follows with Sonnet 4.6. Two days after that, Google fires back with Gemini 3.1 Pro.

Four frontier models. Three labs. Fourteen days.

When the dust settled, something genuinely new had happened: no single model won. Not on benchmarks. Not on user preference. Not on price. Not on coding. For the first time in the frontier AI race, the leaderboard fractured into distinct lanes, and the “which model is best?” question stopped having a coherent answer.

This article maps who won what, where each model fails, and how the February shakeup changes the way you should think about your model stack. No cheerleading for any provider. Just the numbers and the trade-offs. — Read More

#architecture

MCP is dead. Long live the CLI

I’m going to make a bold claim: MCP is already dying. We may not fully realize it yet, but the signs are there. OpenClaw doesn’t support it. Pi doesn’t support it. And for good reason.

When Anthropic announced the Model Context Protocol, the industry collectively lost its mind. Every company scrambled to ship MCP servers as proof they were “AI first.” Massive resources poured into new endpoints, new wire formats, new authorization schemes, all so LLMs could talk to services they could already talk to.

I’ll admit, I never fully understood the need for it. You know what LLMs are really good at? Figuring things out on their own. Give them a CLI and some docs and they’re off to the races.

I tried to avoid writing this for a long time, but I’m convinced MCP provides no real-world benefit, and that we’d be better off without it. Let me explain. — Read More

#devops

The third era of AI software development

When we started building Cursor a few years ago, most code was written one keystroke at a time. Tab autocomplete changed that and opened the first era of AI-assisted coding.

Then agents arrived, and developers shifted to directing agents through synchronous prompt-and-response loops. That was the second era. Now a third era is arriving. It is defined by agents that can tackle larger tasks independently, over longer timescales, with less human direction.

As a result, Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work.

Many of us at Cursor are already working this way. More than one-third of the PRs we merge are now created by agents that run on their own computers in the cloud. A year from now, we think the vast majority of development work will be done by these kinds of agents. — Read More

#devops

5 AI Architecture Decisions That Will Define Your Career in the Next 3 Years

AI didn’t suddenly make data and AI architecture harder.

It made it legible.

In 2026, systems are easier to inspect, reason about, and question than ever before. AI copilots, automated reviews, and architectural analysis tools don’t just help teams move faster—they surface decisions that were previously buried under complexity.

That shift quietly changed what defines a successful career. — Read More

#architecture

How to Think About the Anthropic-Pentagon Dispute

The Pentagon wants AI that can fight wars — without limits. One of the United States’ leading AI companies says there are lines it won’t cross. And this week, that standoff turned into an all-out confrontation.

To discuss the implications of the dispute between Anthropic and the Pentagon, including the determination that the company represents a supply chain risk, I spoke to two experts:

— Kat Duffy, senior fellow for digital and cyberspace policy at the Council on Foreign Relations, and
— Amos Toh, senior counsel in the Liberty and National Security Program at the Brennan Center for Justice.

Read More

#podcasts

What The AI Bubble Talk Misses: The Declining Marginal Cost of Additional Use Cases

The AI bubble is often compared to the early days of the railroad or telecom industries to draw parallels between capital expenditures and eventual revenues from those investments. That comparison is misleading, because in railroads and telecom, the expense was incurred to connect things. Every new rail route required steel, labor, land rights, and years of construction. Telecom required trenching fiber across continents. Revenue scaled linearly with physical deployment — every new mile was expensive.

In AI, it’s the opposite. Developing our AI engines is expensive. Connecting things to our AI engines is cheap, and getting cheaper. A new data pipeline. A prompt template. An API integration. An MCP Server. You’re not digging trenches — you’re copying software. This means the capex-to-revenue curve should look fundamentally different from railroads or telecom. Those industries needed decades of physical buildout before revenue caught up. AI needs months. — Read More

#strategy