The term harness has emerged as a shorthand to mean everything in an AI agent except the model itself – Agent = Model + Harness. That is a very wide definition, and therefore worth narrowing down for common categories of agents. I want to take the liberty here of defining its meaning in the bounded context of using a coding agent. In coding agents, part of the harness is already built in (e.g. via the system prompt, or the chosen code retrieval mechanism, or even a sophisticated orchestration system). But coding agents also provide us, their users, with many features to build an outer harness specifically for our use case and system.
A well-built outer harness serves two goals: it increases the probability that the agent gets it right in the first place, and it provides a feedback loop that self-corrects as many issues as possible before they even reach human eyes. Ultimately it should reduce the review toil and increase the system quality, all with the added benefit of fewer wasted tokens along the way. — Read More
Recent Updates Page 6
The Revenge of the Data Scientist
Is the heyday of the data scientist over? The Harvard Business Review once called it “The Sexiest Job of the 21st Century.”1 In tech, data scientist roles were often among the best paid.2 The job also demanded an unusual mix of skills.
In addition to creating a high-barrier to entry, these skills enabled data scientists to build predicitive models, measure casuality and find patterns in data. Of these, predicitive modeling paid best. Companies later peeled that work off into a new title: Machine Learning Engineer (“MLE”).
For years, shipping AI meant keeping data scientists and MLEs on the critical path. With LLMs, this stopped being the default. Foundation-model APIs now allow teams to integrate AI independently.
Getting cut out of the loop rattled data scientists and MLEs I know. If the company no longer needs you to ship AI, it is fair to wonder whether the job still has the same upside. The harsher story people tell themselves: unless you are pretraining at a foundation-model lab, you are not where the action is.
I read it the other way. Training models was never most of the job. — Read More
Everyone Analyzed Claude Code’s Features. Nobody Analyzed Its Architecture.
On March 31, 2026, thousands of developers worldwide did the same thing: they fed Claude Code’s own source code back into Claude and asked it to explain itself.
Anthropic’s flagship CLI tool had just leaked its entire 512,000-line TypeScript codebase through a source map file accidentally bundled into an npm package. Within hours, the internet had cataloged 44 feature flags, a Tamagotchi pet system with 18 species and gacha mechanics, and internal codenames like “Tengu,” “Fennec,” and “Penguin Mode.”
But the feature list is not the story. Everyone wrote that article already. The real value of this leak is not what Claude Code can do. It is how Claude Code thinks. And the fact that developers paid Anthropic, per token, to understand Anthropic’s own product? That is not irony. That is the thesis. — Read More
AI Coding Agents, Deconstructed
In this article, I want to make the case for a structured way to think about Large Language Model (LLM)-based agentic systems (mostly for coding, but also for knowledge work in general) that fixes some of the greatest pains I (and I sure most of you) have been facing when trying to scale AI-assisted workflows to professional levels.
It’s a system that puts the right constraints in the right places and leaves just enough space for creative exploration (or however you want to call what LLMs do when they hallucinate in your favor). It’s also a system that makes it clear you are in charge. — Read More
Bad Analogies
I saw an interaction on twitter the other day that I’m not going to share here because these specific people don’t deserve to be singled out because almost everyone is guilty of something similar.
Basically, one person said, “It’s amazing that OpenAI was able to raise $122 billion when they don’t have a single business that works,” and the other person replied, “Yeah they do they have a bunch of different business lines doing billions of dollars in revenue,” and the OP responded, “Yes but none of them is profitable, they’re losing a lot of money,” and the replier replied, “You could have said the same thing about Amazon!”
Amazon’s success has done a great deal of harm to a great number of companies.
Obviously, it’s done more good. AWS is a miracle. But go with me.
Jeff Bezos is a generational entrepreneur who came from a hedge fund and made a very calculated decision to lose money in the short term if it meant making more of it in the long term. — Read More
What Is Claw Code? The Claude Code Rewrite Explained
… On March 31, 2026, security researcher Chaofan Shou noticed something odd in the npm registry. Version 2.1.88 of @anthropic-ai/claude-code had shipped with a 59.8 MB JavaScript source map file attached.
… Within hours of the exposure, mirrored repositories appeared on GitHub. Anthropic began issuing DMCA takedowns. The internet did not wait.
Sigrid Jin (@instructkr) — a Korean developer who had attended Claude Code’s first birthday party in San Francisco in February — published what became claw-code. The repo reached 50,000 stars in two hours, one of the fastest accumulation rates GitHub has recorded.
The important distinction: claw-code is not an archive of the leaked TypeScript. It’s a clean-room Python rewrite, built from scratch by reading the original harness structure and reimplementing the architectural patterns without copying Anthropic’s proprietary source. Jin built it overnight using oh-my-codex, an orchestration layer on top of OpenAI’s Codex, with parallel code review and persistent execution loops.
… The real value here — for builders — isn’t the drama. It’s what the exposed architecture tells us about how production-grade agentic coding systems are actually structured. — Read More
Clouded Judgement 3.20.26 – Digital Twins
Every week I meet with founders building in the agent space. And lately, I keep hearing the same concept come up over and over – digital twins (or some version of this). When a concept starts showing up as frequently as this one, my ears generally perk up. Digital twins are the thing perking up my ears! And I think they’re about to become one of the most important concepts in AI. I think they could become a layer that helps scales AI to the masses (and consumption of AI).
So what actually is a digital twin? The term originally comes from manufacturing. You’d build a digital replica of a physical asset (a jet engine, a factory floor) to simulate and monitor it. With AI it’s the same core concept, but with a totally new application. In the AI era, a digital twin is just representing knowledge (from any source, in any form) digitally, so an agent can act on it. That knowledge could live in a person’s head, across a dozen siloed systems, in years of company history, or in the collective behavior of your customers. The twin is just the bridge between that knowledge and the agent that needs it to do work.
… This is where I think the job displacement narrative gets it wrong. Everyone asks “will AI take my job?” But the better question is “can I build a digital twin of myself before someone else does it for me?” The people who win in this world are generally the ones who move fastest to adopt new technologies. — Read More
Diving into Claude Code’s Source Code Leak
On March 31, 2026, Anthropic accidentally shipped a .map sourcemap file inside a Claude Code npm update. In minutes, this was found and was going viral. The 600k lines of code were mirrored, analyzed, ported to Python and other languages, and uploaded to decentralized servers.
Claude Code is known to be notoriously closed down. Their Agent SDKs provide almost no insight into the internals of Claude Code, and Anthropic themselves do their best to keep the source as closed as possible.
… The legal question nobody has an answer to yet: does a codegen clean-room rebuild violate copyright? — Read More
DefenseClaw
DefenseClaw is the enterprise governance layer for OpenClaw. It sits between your AI agents and the infrastructure they run on, enforcing a simple principle: nothing runs until it’s scanned, and anything dangerous is blocked automatically. — Read More
When agents hit the walls
For decades, structural engineers and IT teams have shared the same testing logic: apply controlled pressure, find where things give way and fix. In IT, that means a server that buckles at scale, a query that times out under load or a process that degrades when pushed past its limits.
Agentic AI could upend the way we approach testing. When an agent stops, there is no bug to fix, no threshold to raise. The agent is at a dead end: a system it can’t reach, an approval with no interface, a data handoff that lived in someone’s morning routine instead of in the architecture. This becomes about not a flaw in what was built, but of what wasn’t.
Humans filled those gaps without anyone noticing until now. An agent can’t. And every place it stops is a precise record of where the enterprise assumed a connection that was never made. These gaps were always load-bearing, patched up and held up by hand. Now you have a blueprint. — Read More