4 Agentic AI Design Patterns & Real-World Examples

Agentic AI design patterns enhance the autonomy of large language smodels (LLMs) like Llama, Claude, or GPT by leveraging tool-use, decision-making, and problem-solving. This brings a structured approach for creating and managing autonomous agents in several use cases. — Read More

#devops

I Still Prefer MCP Over Skills

The AI space is pushing hard for “Skills” as the new standard for giving LLMs capabilities, but I’m not a fan. Skills are great for pure knowledge and teaching an LLM how to use an existing tool. But for giving an LLM actual access to services, the Model Context Protocol (MCP) is the far superior, more pragmatic architectural choice. We should be building connectors, not just more CLIs. — Read More

#devops

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you. — Read More

#devops

A Taxonomy of RL Environments for LLM Agents

Model architecture gets all the attention. Post-training recipes follow close behind. The reinforcement learning (RL) environment — what the model actually practices on, how its work gets judged, what tools it can use — barely enters the conversation. That’s the part that actually determines what the agent can learn to do.

A model trained only on single-turn Q&A will struggle the moment you ask it to maintain state across a 50-step enterprise workflow. A model trained with a poorly designed reward function will learn to game the metric and not solve the problem. Reinforcement learning environments is half the system. — Read More

#devops

#architecture

The 2nd Phase of Agentic Development

Yesterday we talked about how cheap code is fueling an era of idiosyncratic tooling, and previously we’ve talked about the rise of spec driven development. In that second piece, we ran through some of the initial examples of spec driven development with agents.

.. The first wave of agentic development brought us clones and ports. When code is incredibly cheap, and you want the code to flow, you can either rely on your own fast feedback or leverage existing test suites. These early projects opted for the latter, as did many tokenmaxxers who are rebuilding their dependencies in Rust or Go.

Two releases this week, however, suggest we’re starting to enter a second phase of open source agentic coding projects. The first brought us clones, this next phase brings us reimaginings.  — Read More

#devops

Harness engineering for coding agent users

The term harness has emerged as a shorthand to mean everything in an AI agent except the model itself – Agent = Model + Harness. That is a very wide definition, and therefore worth narrowing down for common categories of agents. I want to take the liberty here of defining its meaning in the bounded context of using a coding agent. In coding agents, part of the harness is already built in (e.g. via the system prompt, or the chosen code retrieval mechanism, or even a sophisticated orchestration system). But coding agents also provide us, their users, with many features to build an outer harness specifically for our use case and system.

A well-built outer harness serves two goals: it increases the probability that the agent gets it right in the first place, and it provides a feedback loop that self-corrects as many issues as possible before they even reach human eyes. Ultimately it should reduce the review toil and increase the system quality, all with the added benefit of fewer wasted tokens along the way. — Read More

#devops

What Is Claw Code? The Claude Code Rewrite Explained

… On March 31, 2026, security researcher Chaofan Shou noticed something odd in the npm registry. Version 2.1.88 of @anthropic-ai/claude-code had shipped with a 59.8 MB JavaScript source map file attached.

… Within hours of the exposure, mirrored repositories appeared on GitHub. Anthropic began issuing DMCA takedowns. The internet did not wait.

Sigrid Jin (@instructkr) — a Korean developer who had attended Claude Code’s first birthday party in San Francisco in February — published what became claw-code. The repo reached ​50,000 stars in two hours​, one of the fastest accumulation rates GitHub has recorded.

The important distinction:​ ​claw-code​ is not an archive of the leaked TypeScript. It’s a clean-room Python rewrite, built from scratch by reading the original harness structure and reimplementing the architectural patterns without copying Anthropic’s proprietary source. Jin built it overnight using oh-my-codex, an orchestration layer on top of OpenAI’s Codex, with parallel code review and persistent execution loops.

… The real value here — for builders — isn’t the drama. It’s what the exposed architecture tells us about how production-grade agentic coding systems are actually structured. — Read More

#architecture, #devops

When agents hit the walls

For decades, structural engineers and IT teams have shared the same testing logic: apply controlled pressure, find where things give way and fix. In IT, that means a server that buckles at scale, a query that times out under load or a process that degrades when pushed past its limits.

Agentic AI could upend the way we approach testing. When an agent stops, there is no bug to fix, no threshold to raise. The agent is at a dead end: a system it can’t reach, an approval with no interface, a data handoff that lived in someone’s morning routine instead of in the architecture. This becomes about not a flaw in what was built, but of what wasn’t.

Humans filled those gaps without anyone noticing until now. An agent can’t. And every place it stops is a precise record of where the enterprise assumed a connection that was never made. These gaps were always load-bearing, patched up and held up by hand. Now you have a blueprint. — Read More

#devops

The Feedback Loop Is All You Need

So Claude Code added CRON a few days ago. Recurring tasks, native, built right in. The thing we’ve been dreaming about since the first AI coding demos — schedule an agent, go to sleep, wake up to merged PRs. An engineer that works while you don’t.

And I’m sitting here like… I can’t even use this. Not on the real codebase. Not at work.

The old loop: write or review code, spot smells by experience, leave comments explaining intent, promise to fix things “later” — which usually meant never.

The new loop: encode rules once, let agents iterate against them, observe what fails, tighten the constraints. Less “remember this next time,” more “this literally cannot happen.”

Agents break the old loop completely. When code can be produced nonstop, manual review becomes the weakest link. — Read More

#devops

Meta-Harness: End-to-End Optimization of Model Harnesses

The performance of large language model (LLM) systems depends not only on model weights, but also on their harness: the code that determines what information to store, retrieve, and present to the model. Yet harnesses are still designed largely by hand, and existing text optimizers are poorly matched to this setting because they compress feedback too aggressively. We introduce Meta-Harness, an outer-loop system that searches over harness code for LLM applications. It uses an agentic proposer that accesses the source code, scores, and execution traces of all prior candidates through a filesystem. On online text classification, Meta-Harness improves over a state-of-the-art context management system by 7.7 points while using 4x fewer context tokens. On retrieval-augmented math reasoning, a single discovered harness improves accuracy on 200 IMO-level problems by 4.7 points on average across five held-out models. On agentic coding, discovered harnesses surpass the best hand-engineered baselines on TerminalBench-2. Together, these results show that richer access to prior experience can enable automated harness engineering. — Read More

#devops