… “prompting has split into four skills” — Context, Intent, Specification, Prompt. Everyone matched a tension one of us had brought into the room. And once they had names, something else clicked: the four crafts mapped cleanly onto P-CAM — Perception, Cognition, Agency, Manifestation.
…For the last eight months, the argument has been spec versus vibe. Structure versus flow. Waterfall versus emergence.
…Every standard critique of SDD, and every standard critique of vibe, traces back to the same thing. Not two sets of failures. One failure, surfacing on both sides of the debate. The three-layer collapse.
…Vibe coding collapsed because it had no contract. Spec-driven development is collapsing because it has three contracts pretending to be one. What rises from the fusion isn’t a new brand. It isn’t a better tool. It’s a separation of concerns — the oldest principle in software engineering — applied one layer up, to the documents we use to instruct the machines that write the documents. — Read More
Tag Archives: DevOps
Beyond the Coding Assistant: A Series on AI-Assisted Software Engineering
This is the first article of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available here.
The last few years of AI-assisted development have been remarkable. Coding assistants have crossed real quality bars. Engineers can now produce working code, in unfamiliar languages, against unfamiliar systems, at speeds that would have looked like science fiction in 2022. There are real productivity gains, real new affordances, and a real shift in what an individual developer can do in an afternoon.
And yet — when the conversation turns to the team and the organization — the picture is more complicated. The dramatic gains many leaders were promised haven’t shown up on every team. Some teams ship more. Some teams ship the same. Some teams have actually gotten slower, with the AI helping at the keystroke while the wider delivery metrics regress.
That gap, between what’s possible at the keystroke and what’s actually showing up in delivery, is what this series is about. The question I want to ask, and try to answer over the next several articles, is simple: what has changed, and what changes could take us so much farther than where current AI coding assistants have brought us? — Read More
How Claude Code works in large codebases: Best practices and where to start
Claude Code is running in production across multi-million-line monorepos, decades-old legacy systems, distributed architectures spanning dozens of repositories, and at organizations with thousands of developers. These environments present challenges that smaller, simpler codebases don’t, whether that’s build commands that differ across every subdirectory or legacy code spread across folders with no shared root.
This article covers the patterns we’ve observed that have led to successful adoption of Claude Code at scale. We use “large codebase” to refer to a wide range of deployments: monorepos with millions of lines, legacy systems built over decades, dozens of microservices across separate repositories, or any combination of the above. That also includes codebases running on languages that teams don’t always associate with AI coding tools, such as C, C++, C#, Java, PHP. (Claude Code performs better than most teams expect it to in those cases, particularly as of recent model releases.) While every large codebase deployment is shaped by its specific version control, team structure, and accumulated conventions, the patterns here generalize across them and are a good starting point for teams considering adopting Claude Code. — Read More
Multi-Agent Systems: When 2 Agents Beat 1 (and When They Don’t)
You see the word multi-agent everywhere right now. People build systems with five different AI personas talking to each other in a simulated chat room just to scrape a website and write a blog post. They give them names like Researcher, Writer, and Editor and watch the terminal output scroll by as the agents debate with each other. It all looks impressive but is not the right way you build software.
Adding more agents to a system does not automatically make it smarter. It actually multiplies your failure rate. Think about the basic math of probability. If you have a single agent that executes its task correctly 90% of the time, your naive system reliability is 0.90.
If you chain three of those agents together, you multiply those probabilities. Your baseline reliability just dropped to 72%. You doubled your latency, tripled your API cost, and made the final output no better.
… We will see exactly why the single agent misses a critical billing logic flaw, and why the two-agent system catches it. — Read More
Im going back to writing code by hand
Here is k10s: https://github.com/shvbsle/k10s/tree/archive/go-v0.4.0
234 commits. ~30 weekends. Built entirely on vibe-coded sessions with Claude, whenever my tokens lasted long enough to ship something.
I’m archiving my TUI tool and rewriting it from scratch.
…I built it in Go with Bubble Tea [1] and it worked.
For a while… 😦
[What] I learned over these 7 months is worth more than the 1690 lines of model.go I’m throwing away.
….AI writes features, not architecture. The longer you let it drive without constraints, the worse the wreckage gets. The velocity makes you think you’re winning right up until the moment everything collapses simultaneously. — Read More
AI Gateways vs. MCP Gateways: What Security Teams Need to Know
Many vendors in AI security are talking about gateways right now, but they don’t all mean the same thing. Between all of these, the word “gateway” is doing a lot of work, and not all of it is consistent.
Security teams are being asked to evaluate these technologies, and the terminology is genuinely confusing. In conversations with enterprises across financial services, insurance, pharma, and tech, we consistently find that teams conflate AI gateways with MCP gateways. They assume one covers what the other does. Some vendors actively blur the lines by combining both functions into a single product. Others treat them as entirely separate categories.
This post breaks down what each type does, where the real value is, and where the gaps are that neither fills. We will focus on functionality first, not vendor definitions. A note on terminology: the market uses “AI gateway,” “LLM gateway,” and “MCP gateway” loosely, and some vendors bundle multiple functions under a single label. Throughout this post, we use “AI gateway” to refer specifically to the LLM inference proxy layer (managing traffic between agents and model providers), distinct from “MCP gateway” (managing traffic between agents and their tools). Where vendors combine both, we will call that out. — Read More
MCP Marketplace Brings Real-Time Intelligence to Agentic Applications
An agentic application is an AI system that knows your business context, reasons autonomously, and takes action based on real-time data and specialized expertise. Agent Bricks, Genie, Apps, and Lakebase give enterprises the tools to build agentic applications at scale. But there’s a critical gap: agents built solely on internal data can’t truly think.
Consider a loan approval agent. It has access to your bank’s loan book, customer history, and credit scores. But it lacks the context that humans instinctively use.
… Without this real-time intelligence, agents become knowledge-limited—constrained by historical data, unable to reason about the world as it is now. They can execute workflows, but they can’t make informed decisions.
The old solution? Manual research. Analysts pull data from multiple sources, lose context switching between tools, and create bottlenecks. Decisions slow down. Risk increases
Agents need a way to access live, trusted intelligence while they reason through complex problems. That’s where the MCP Marketplace comes in. — Read More
Anthropic Shipped Outcomes and Real Story Is Verification Becoming a SKU
You have written this loop before. Eighteen months ago, when you first put a Claude agent into production, you wrote a rubric. You wrote a grader. You wrote retry logic for when the grader said no. The pieces broke. You patched them. The rubric drifted. You rewrote it.
On May 6 at Code with Claude San Francisco, Anthropic shipped your loop as an API endpoint and called it Outcomes.
That is the news. The story underneath it is bigger. Outcomes is the first harness layer Anthropic decided to sell. Dreams, Multi-Agent, and Webhooks are the same move on memory, orchestration, and lifecycle. The harness used to be code you wrote. It is becoming a stack of products you compose. — Read More
Your Claude Has Felt Dumber for Weeks Anthropic Finally Said Why
For six weeks you fought Claude Code. Prompts that used to work stopped working. Usage limits drained twice as fast. Sessions felt forgetful, repetitive, oddly lazy. You blamed yourself. You blamed your prompts. You read the Reddit threads where someone calmly explained that the model is fine and you’re holding it wrong.
On April 23, Anthropic published the receipts. The model was fine. Three things in the harness around it were not.
Three changes. Three schedules. Three bug fixes. Each shipped through code review, internal evals, and dogfooding. Each survived weeks before users forced the diagnosis. The post mortem is the cleanest field experiment in harness engineering anyone has published. The pattern in it is more important than the bugs. — Read More
How to Build an AI Agent: From Idea to Real-World System
Everyone wants to build an AI agent right now.
Not just a chatbot. Not just a prompt wrapper.
A real AI agent — something that can understand goals, use tools, remember context, interact with users, and improve over time.
…[B]uilding an agent is not one decision. It’s a system design problem. An AI agent only becomes useful when several layers work together — purpose, prompts, models, memory, orchestration, interfaces, and evaluation. — Read More