A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all.

We pulled dozens of AGENTS.md files from across our monorepo and measured their effect on code generation. The best ones gave our coding agent a quality jump equivalent to upgrading from Haiku to Opus. The worst ones made the output worse than having no AGENTS.md at all.

That gap was surprising enough that we built a systematic study around it.

What we found: most of what people put in AGENTS.md either doesn’t help or actively hurts, and the patterns that work are specific and learnable. — Read More

#devops

Building the 11 Layers of a Production-Grade MCP Server + Agentic System

MCP servers are becoming the core focus of production agentic systems because they are where all the hard problems actually live: multi-tenant isolation, auth, rate limits, audit trails, and approval gates for destructive operations. Without them, agents leak data across tenants, burn budgets in runaway loops, and commit to refunds no human approved. An MCP server solves this by sitting between the agents and the data layer as a single secure tool surface, turning every agent call into an authenticated, policy-checked, rate-limited, audited operation before it touches a single row …

In this blog, we are going to build Atlas-MCP, a production-grade MCP server organized around twelve components that keep showing up on the 3 AM pager when teams skip them. On top of the server, we are also going to build a four-agent support copilot (Planner, Retriever, Synthesizer, Critic) that uses the server’s tools to answer real customer support tickets end to end. — Read More

#devops

The AI engineering stack we built internally — on the platform we ship

In the last 30 days, 93% of Cloudflare’s R&D organization used AI coding tools powered by infrastructure we built on our own platform.

Eleven months ago, we undertook a major project: to truly integrate AI into our engineering stack. We needed to build the internal MCP servers, access layer, and AI tooling necessary for agents to be useful at Cloudflare. We pulled together engineers from across the company to form a tiger team called iMARS (Internal MCP Agent/Server Rollout Squad). The sustained work landed with the Dev Productivity team, who also own much of our internal tooling including CI/CD, build systems, and automation..

… MCP servers were the starting point, but the team quickly realized we needed to go further: rethink how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate across thousands of repos..

This post dives deep into what that looked like over the past eleven months and where we ended up.  — Read More

#devops

Best practices for building agentic systems

Agentic AI has emerged as the software industry’s latest shiny thing. Beyond smarter chatbots, AI agents operate with increasing autonomy, making them poised to drive efficiency gains across enterprises.

“Agentic refers to AI systems that can take actions on behalf of users, not just generate text or answer questions,” says Andrew McNamara, director of applied machine learning at Shopify. Agentic systems run continuously until a task is complete, he adds, citing Shopify’s Sidekick, a proactive agent for merchants.

Development of agentic AI now spans many business domains. According to Anthropic, a provider of large language models (LLMs), AI agents are most commonly deployed in software engineering, accounting for roughly half of use cases, followed by back-office automation, marketing, sales, finance, and data analysis. — Read More

#devops

Wardgate – AI Agent Security Gateway

Wardgate is a security gateway that sits between AI agents and the outside world — isolating credentials for API calls, isolating SSH keys for remote command execution, and gating command execution in remote environments (conclaves).

Give your AI agents access to APIs, SSH keys, and shell tools – without giving them your credentials or trusting them with direct execution. — Read More

#devops

Salesforce launches Headless 360 to support agent-first enterprise workflows

Salesforce is packaging its developer and AI tooling, including its vibe coding environment Agentforce Vibes, into a new platform named Headless 360, designed to help enterprise teams build agent-first workflows.

The CRM software provider defines agent-first workflows as enterprise processes in which software agents, rather than human users, carry out tasks by directly invoking APIs, tools, and predefined business logic.

To support this approach, Headless 360 exposes Salesforce’s underlying data, workflows, and governance controls as APIs, MCP tools, and CLI commands, via its existing offerings, such as Data 360, Customer 360, and Agentforce, Joe Inzerillo, president of AI technology at Salesforce, said during a press briefing. — Read More

#devops

Why Agentic AI Is the #1 Skill To Learn

I’m not here to tell you AI is coming for your job. You’ve heard that a hundred times already, and frankly, nobody wants to here the same thing again.

You’ve also probably read the top skills to learn in 2026. Learn Python. Learn AI. Learn prompt engineering. Sure all those are valid. But here’s the thing: everyone is saying that. And when everyone is saying the same thing, the real opportunity is usually one step ahead.

So what’s that step?

Agentic AI. And hang on, it’s not some buzzword to add to your LinkedIn bio. It’s a fundamental shift in what AI does, how it thinks, how it works, and what it’s capable of. Right now, very few people understand it deeply enough to actually build with it.

That gap is exactly where opportunity lives. — Read More

#devops

What Is Vibe Engineering? How AI Turns Ideas Into Working Prototypes Instantly

For most people, ideas used to die before they were ever built.

“How are you actually going to build this?”

And we didn’t have a real answer.

Fast forward to today, that exact situation looks very different.

If you have an idea now, you don’t immediately worry about whether you can build it or not. You open an AI tool, start describing what you want, explore possibilities, and within minutes, you have something that resembles a working prototype. The barrier between imagination and execution has almost disappeared.

This shift is what we call vibe engineering. — Read More

#devops

Managing context in long-run agentic applications

In complex, long-running agentic systems, maintaining alignment and coherent reasoning between agents requires careful design. In this second article of our series, we explore these challenges and the mechanisms we built to keep teams of agents working productively over long time spans. We present a range of complementary techniques that balance the conflicting requirements of continuity and creativity.

… Language model APIs are stateless: to provide continuity between requests, the caller must provide the complete message history with each request. Agent frameworks solve the state management problem for users by accumulating message history between API calls. This fills the agent’s context window, which provides a hard limit on how much information the agent can handle. Even approaching an agent’s context window limit can degrade the quality of responses. For short-run applications, no extra context window management is typically required.

Complex security investigations can span hundreds of inference requests and generate megabytes of output, requiring special handling. Multi-agent applications, like ours, add further complexities. For each agent to optimally execute its role, it requires a tailored view of the investigation state. Each view must be carefully balanced. If agents are not anchored to the wider team, the investigation will be disconnected and incoherent. Conversely, sharing too much information stifles creativity and encourages confirmation bias.

Our solution uses three complementary context channels: Director’s Journal, Critic’s review and Critic’s Timeline. — Read More

Earlier Article

#devops

Stop Treating AI Memory Like a Search Problem

Back in October, my AI assistant stored a memory with an importance score of 8/10. Content: “Investigating Bun.js as a potential runtime swap.”

I never actually switched to Bun. To be fair, it was a two-day curiosity that went nowhere. But this memory persisted for six months, popping up each time I asked about my build process and quietly pushing the AI toward a Bun solution with confidence.

There was nothing wrong with the system; it was doing exactly what it was supposed to do. That was the issue. — Read More

#devops