How is search implemented where you work? Probably as a complex set of capabilities on top of retrieval. Our search APIs understand queries, call backend search systems, and finally rerank results.
But if we had an agent in the loop, would we need all that? Could we replace search backends with an agent? After all, an agent understands user requests, calls retrieval tools, and evaluates relevance on its own. We see ChatGPT do this all the time, why can’t our search bar?
In other words, if you give a basic BM25 backend to an agent, could it take the Search API’s job? — Read More
Tag Archives: DevOps
Orchestrating AI Code Review at scale
Code review is a fantastic mechanism for catching bugs and sharing knowledge, but it is also one of the most reliable ways to bottleneck an engineering team. A merge request sits in a queue, a reviewer eventually context-switches to read the diff, they leave a handful of nitpicks about variable naming, the author responds, and the cycle repeats. Across our internal projects, the median wait time for a first review was often measured in hours.
When we first started experimenting with AI code review, we took the path that most other people probably take: we tried out a few different AI code review tools and found that a lot of these tools worked pretty well, and a lot of them even offered a good amount of customisation and configurability! Unfortunately, though, the one recurring theme that kept coming up was that they just didn’t offer enough flexibility and customisation for an organisation the size of Cloudflare.
… Instead of building a monolithic code review agent from scratch, we decided to build a CI-native orchestration system around OpenCode, an open-source coding agent. Today, when an engineer at Cloudflare opens a merge request, it gets an initial pass from a coordinated smörgåsbord of AI agents. Rather than relying on one model with a massive, generic prompt, we launch up to seven specialised reviewers covering security, performance, code quality, documentation, release management, and compliance with our internal Engineering Codex. These specialists are managed by a coordinator agent that deduplicates their findings, judges the actual severity of the issues, and posts a single structured review comment. — Read More
Symphony
Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage work instead of supervising coding agents.
In this demo video, Symphony monitors a Linear board for work and spawns agents to handle the tasks. The agents complete the tasks and provide proof of work: CI status, PR review feedback, complexity analysis, and walkthrough videos. When accepted, the agents land the PR safely. Engineers do not need to supervise Codex; they can manage the work at a higher level.
… Symphony works best in codebases that have adopted harness engineering. Symphony is the next step — moving from managing coding agents to managing work that needs to get done. — Read More
Agent Memory Patterns
Say you get asked to “add memory” to an agent. What does that mean? How do you do it?
There’s three common kinds of mutable memory:
1. Files
2. Memory blocks
3. Skills
If you don’t need the agent to learn, then you’re looking in the wrong place. You don’t need memory. But this post might also be useful if you’re just using agents, like a coding agent. — Read More
Set Up Useful AI Teammates With New ChatGPT Workspace Agents
In this guide, you will learn how to set up AI teammates that are actually useful in ChatGPT’s new Workspace Agents tool. The goal is one daily agent that handles a recurring morning task for you, instead of one more prompt you have to remember to run yourself.
You will build a daily agent that owns one recurring workflow and runs it for you each morning.
In our demo, that agent reviewed a Notion database of published guides, looked for useful patterns, and generated three new guide ideas every day. — Read More
Google says 75% of the company’s new code is AI-generated
Three-quarters of new code created inside Google is now generated by AI and reviewed by human engineers, the company said Wednesday.
That number has been notching up in recent years. As of October 2024, around a quarter of the company’s code was AI-generated, Google said at the time. Last fall, it said the number had risen to 50%.
The company has been pushing employees to use AI both for coding and other tasks. Google CEO Sundar Pichai said in a blog post on Wednesday that the company was shifting to “truly agentic workflows” with its engineers running more autonomous tasks — Read More
HX Is The New UX: What You Need To Know About Harness Experience.
For thirty years, the central obsession of product design has been a single question: how do we make it easier for a human to click the right button? We built funnels. We A/B tested button colors. We agonized over empty states and loading spinners.
That era is ending — not gradually, but structurally.
… Agents don’t navigate UIs. They negotiate with systems. And when the agent is the primary “user” of software, the human behind it occupies an entirely different role — one for which we have almost no design vocabulary. Until now.
HX — Harness Experience — is the design discipline governing the interface between a human and their agentic fleet. — Read More
A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all.
We pulled dozens of AGENTS.md files from across our monorepo and measured their effect on code generation. The best ones gave our coding agent a quality jump equivalent to upgrading from Haiku to Opus. The worst ones made the output worse than having no AGENTS.md at all.
That gap was surprising enough that we built a systematic study around it.
What we found: most of what people put in AGENTS.md either doesn’t help or actively hurts, and the patterns that work are specific and learnable. — Read More
Building the 11 Layers of a Production-Grade MCP Server + Agentic System
MCP servers are becoming the core focus of production agentic systems because they are where all the hard problems actually live: multi-tenant isolation, auth, rate limits, audit trails, and approval gates for destructive operations. Without them, agents leak data across tenants, burn budgets in runaway loops, and commit to refunds no human approved. An MCP server solves this by sitting between the agents and the data layer as a single secure tool surface, turning every agent call into an authenticated, policy-checked, rate-limited, audited operation before it touches a single row …
In this blog, we are going to build Atlas-MCP, a production-grade MCP server organized around twelve components that keep showing up on the 3 AM pager when teams skip them. On top of the server, we are also going to build a four-agent support copilot (Planner, Retriever, Synthesizer, Critic) that uses the server’s tools to answer real customer support tickets end to end. — Read More
The AI engineering stack we built internally — on the platform we ship
In the last 30 days, 93% of Cloudflare’s R&D organization used AI coding tools powered by infrastructure we built on our own platform.
Eleven months ago, we undertook a major project: to truly integrate AI into our engineering stack. We needed to build the internal MCP servers, access layer, and AI tooling necessary for agents to be useful at Cloudflare. We pulled together engineers from across the company to form a tiger team called iMARS (Internal MCP Agent/Server Rollout Squad). The sustained work landed with the Dev Productivity team, who also own much of our internal tooling including CI/CD, build systems, and automation..
… MCP servers were the starting point, but the team quickly realized we needed to go further: rethink how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate across thousands of repos..
This post dives deep into what that looked like over the past eleven months and where we ended up. — Read More