Knowledge Agents: Beat Frontier Models with Better Structure

Anthropic recently had to pull Mythos/Fable due to an edict from the US government. While Mythos was a step up from Opus, I’ve been actively moving smaller in terms of my agentic models—and matching the quality of output of some of the largest frontier models.

The use cases have spanned from hard “hedge fund level” (for want of a better description) market analysis, financial management, and AI personal assistants to even helping a few friends in difficult medical situations. I’ve called this pattern “knowledge agents” with a generic template available to everyone here. They literally inject the right knowledge into the AI agent plugged into it. Anyone can do this, with or without my template.Read More

#devops

Build the Loop, Not the Agent

Most teams treat agent modernization as a project with a finish line: re-architect around today’s frontier model, ship, declare victory. But model capability advances faster than any single modernization effort can complete. By the time you finish re-architecting around today’s frontier model, the next one has already shifted the ground beneath you. Plan around a fixed end state and you will perpetually ship an architecture tuned to a model generation that is already obsolete.

The better bet is simple: the team that can iterate fastest against the newest models will win.  — Read More

#devops

I Built a Monster CLAUDE.md, And My Coding Agent Got Scary Good

A coding agent can write a thousand lines before you finish your coffee.  The problem is  that a good chunk of those lines are confidently, fluently wrong. The code compiles, and it reads like something a careful engineer wrote. It also quietly assumed the wrong thing three functions ago, and now you get to find out where.

So when people told me the fix was a markdown file of rules, I rolled my eyes. A text file telling a model to behave sounded like taping a “please be tidy” note to a tornado.

Then after studying couple of them I built one… — Read More

#devops

Agentic Code Review

Coding agents are extraordinarily good now, and getting better fast. The interesting consequence is that the hard part of engineering moved from writing code to deciding whether to trust it, which makes review the most leveraged skill in software right now. How you approach it depends enormously on who you are: a solo developer with no users and a team maintaining a ten-year-old application are not solving the same problem.

… Code review used to work because of a happy accident of relative speed. A senior engineer could read code faster than a junior could write it, so review kept pace without anyone designing it to, and the team absorbed how the system fit together as a side effect of reading each other’s diffs. A lot of that was not deliberate. It fell out of a single fact: writing code was the slow, expensive part, and reading it was cheap and fast.

That fact no longer holds. — Read More

#devops

The Mythical Agent-Month

Like a lot of people, I’ve found that AI is terrible for my sleep schedule. In the past I’d wake up briefly at 4 or 4:30 in the morning to have a sip of water or use the bathroom; now I have trouble going back to sleep. I could be doing things. Before I would get a solid 7-8 hours a night; now I’m lucky when I get 6. I’ve largely stopped fighting it: now when I’m rolling around restlessly in bed at 5:07am with ideas to feed my AI coding agents, I just get up and start my day.

Among my inner circle of engineering and data science friends, there is a lot of discussion about how long our competitive edge as humans will last. Will having good ideas (and lots of them) still matter as the agents begin having better ideas themselves? The human-expert-in-the-loop feels essential now to get good results from the agents, but how long will that last until our wildest ideas can be turned into working, tasteful software while we sleep? Will it be a gentle obsolescence where we happily hand off the reins or something else? — Read More

#devops

How To Make Your Design System AI-Ready

AI-generated prototypes often don’t deliver consistently decent results because of tiny inconsistencies scattered all across a design system. I’s decisions made but not documented, hard-coded values never cleaned up, or relying too much on AI making sense of mock-ups or design flows on its own.

Yesterday I stumbled upon a useful practical guide by Hardik Pandya from Atlassian — on how to reduce drifts, minimize mistakes, maintain context, and improve the quality of AI-generated prototypes. Let’s see how it works. — Read More

#devops

The Speed of Prototyping in the Age of AI

A few years back I wrote about my love of throwaway prototypes; those little proof-of-concepts that exist purely to get an idea out of your head and into something tangible. At the time, my biggest bottleneck was me; the time it took to scaffold a project, wire up the boring bits, and get to a place where the interesting parts could actually be tested. Fast forward to now, and that bottleneck has all but vanished.

I’ve been a little hesitant to write about this. I’ve already shared some cautious thoughts on AI and where it fits into my workflow, and I stand by all of it. I still think the industry is figuring things out in real time, and I still think it pays to be careful. But cautious doesn’t mean blind, and the honest truth is that AI has changed how quickly I can go from “I wonder if…” to “oh, it works”. — Read More

#devops

Models inherit a stale web, and they set us back a year

… [T]he models we now write code with learned from a web that is already old. I made the model gap to show this concretely: measured in Chrome releases (I know, I know, the web is far broader than just Chrome, but also Chrome has easy data to access on chromestatus.com), even the freshest model is several versions behind, and most are ten to twenty behind. The “knowledge” cutoff is a serious issue for the web platform, and the ecosystem of libraries and tools that are being launched but are not easily available to these models is massively gaining traction (Claude Code).

That connects to model half-life, where I looked at how quickly models are superseded, and to dead framework theory: if a framework stops appearing in fresh training data, the models stop reaching for it, and the framework quietly dies regardless of its merits. I wrote this thesis at least 6 months ago, and I think I’ve been proven correct (which is why we built Modern Web Guidance). The flip side, though, is that I’ve found guided output getting better than what people create (I think auto-research loops to optimize web performance, as an example, will massively raise the bar for the quality of the web people experience). — Read More

#devops

What is Open Code Review?

Open Code Review is an AI-powered code review CLI tool. It originated as Alibaba Group’s internal official AI code review assistant — over the past two years, it has served tens of thousands of developers and identified millions of code defects. After thorough validation at massive scale, we incubated it into an open source project for the community. Simply configure a model endpoint to get started. — Read More

#devops

Modern Engineering Values

At the end of last year I shared my LLM workflow in You are absolutely right!?. I knew it would be outdated quickly, but I didn’t realize it would be outdated that quickly.

I actually cannot believe that I rarely write code by hand anymore. Or rather, I cannot believe that I used to write code by hand! Programming has fundamentally changed and I’ve been wondering which engineering values still matter. — Read More

#devops