Google’s hot streak in AI-related releases continues unabated. Just a few days ago, it released a new tool for Gemini called URL context grounding.
URL context grounding can be used stand-alone or combined with Google search grounding to conduct deep dives into internet content.
In a nutshell, it’s a way to programmatically have Gemini read, understand and answer questions about content and data contained in individual web URLs (including those pointing to PDFs) without the need to perform what we know as traditional RAG processing. — Read More
Tag Archives: DevOps
The Claude Code SDK and the Birth of HaaS (Harness as a Service)
As tasks require more autonomous behavior from agents, the core primitive for working with AI is shifting from the LLM API (chat style endpoints) to the Harness API (customizable runtimes). I call this Harness as a Service (HaaS). Quickly build, customize, and share agents via a rich ecosystem of agent harnesses. Today we’ll cover how to customize harnesses to build usable agents quickly + the future of agent development in a world of open harnesses. — Read More
Building a Resilient Event Publisher with Dual Failure Capture
When we set out to rebuild Klaviyo’s event infrastructure, our goal wasn’t just to handle more scale, it was to make the system rock solid. In Part 1 of this series, we shared how we migrated from RabbitMQ to a Kafka-based architecture to process 170,000 events per second at peak without losing data. In Part 2, we dived into how we made event consumers resilient.
This post, Part 3, is all about the Event Publisher, the entry point into our event pipeline. The publisher has an important job: It needs to accept events from hundreds of thousands of concurrent clients, serialize them, keep up with unpredictable traffic spikes, and most importantly, ensure that no event is ever lost. If the publisher isn’t resilient, the rest of the pipeline can’t rely on a steady and complete flow of events. — Read More
Scaling Engineering Teams: Lessons from Google, Facebook, and Netflix
After spending over a decade in engineering leadership roles at some of the world’s most chaotic innovation factories—Google, Facebook, and Netflix—I’ve learned one universal truth: scaling engineering teams is like raising teenagers. They grow fast, develop personalities of their own, and if you don’t set boundaries, suddenly they’re setting the house on fire at 3am.
The difference between teams that thrive at scale and those that collapse into Slack-thread anarchy typically comes down to three key factors:
— Structured goal-setting
— A ruthless focus on code quality
— Intentional culture building
Let me share some lessons I learned from scaling teams at Google, Facebook, and Netflix. — Read More
Agile is Out, Architecture is Back
Software development has always been defined by its extremes. In the early days, we planned everything. Specs were sacred. Architecture diagrams came before a single line of code. And every change felt like steering a cargo ship — slow, bureaucratic, and heavily documented.
Then came Agile, and the pendulum swung hard in the other direction. We embraced speed, iteration, and imperfection. “Working software over comprehensive documentation” became the battle cry of a new generation. Shipping fast was more important than getting it right the first time. And to be fair, that shift unlocked enormous productivity. It changed the culture of software for good.
Now, we’re entering a new era — one driven by AI tools that can generate code from a sentence. Tools like GitHub Copilot and Claude Code are reshaping what it means to be a developer. It’s not just about writing code anymore — it’s about designing the environment in which code gets written.
And that pendulum? It’s swinging back again. — Read More
AI Focus: Interception
This is a very quick post. I had an idea as I was walking the dog this evening, and I wanted to build a functioning demo and write about it within a couple of hours.
While the post and idea started this evening, the genesis of the idea has been brewing for a while and goes back over a year to August 2024, when I wrote about being sucked into a virtual internet. WebSim has been on my mind for a while, because I loved the idea of being able to simulate my own version of the web using the browser directly and not via another web page. And a couple of weeks ago, I managed to work out how to get Puppeteer to intercept requests and respond with content generated via an LLM. — Read More
The Last Programmers
We’re witnessing the final generation of people who translate ideas into code by hand.XXXXI quit my job at Amazon in May to join a startup called Icon.
… I felt like I was reaching the ceiling of what I could learn about AI and building good products within Amazon’s constraints. That’s why I joined Icon. At Icon, we move at a completely different speed. We ship features in days that would have taken Amazon months to approve.
… The interesting part is watching how my teammates work. One of them hasn’t looked at actual code in weeks. Instead, he writes design documents in plain English and trusts AI to handle the implementation. When something needs fixing, he edits the document, not the code.
It made me realize something profound: we’re living through the end of an era where humans translate ideas into code by hand. Within a few years, that skill will be as relevant as knowing how to shoe a horse. — Read More
A PM’s Guide to AI Agent Architecture: Why Capability Doesn’t Equal Adoption
Last week, I was talking to a PM who’d in the recent months shipped their AI agent. The metrics looked great: 89% accuracy, sub-second respond times, positive user feedback in surveys. But users were abandoning the agent after their first real problem, like a user with both a billing dispute and a locked account.
“Our agent could handle routine requests perfectly, but when faced with complex issues, users would try once, get frustrated, and immediately ask for a human.”
This pattern is observed across every product team that focuses on making their agents “smarter” when the real challenge is making architectural decisions that shape how users experience and begin to trust the agent. — Read More
Context Engineering Series: Building Better Agentic RAG Systems
We’ve moved far beyond prompt engineering. Now we’re designing portfolios of tools (directory listing, file editing, web search), slash commands like /pr-create that inject prompts vs , specialized sub-agents @pr-creation-agent, vs having an AGENT.md with systems that work across IDEs, command lines, GitHub, and Slack.
Context engineering is designing tool responses and interaction patterns that give agents situational awareness to navigate complex information spaces effectively.
To understand what this means practically, let’s look at how systems have evolved:
Before: We precomputed what chunks needed to be put into context, injected them, and then asked the system to reason about the chunks.
… Now: Agents are incredibly easy to build because all you need is a messages array and a bunch of tools. — Read More
Exploring Foundation Models’ Tool-Use Efficacy
Model Context Protocol (MCP) is an open-source framework launched by Anthropic to standardize the way LLMs use external tools. AI agents use MCP to enable multi-turn workflows, where an LLM (via products like Claude Desktop or Cursor) can select and coordinate between tools in multiple MCP servers. Since its introduction, MCP has quickly become the de facto standard for tool integrations with LLMs.
There are now thousands of official and unofficial MCP servers, each with dozens of tools! While more MCP choices are great for the tool integration ecosystem, sometimes having too many options is a curse. Products like Cursor often limit how many tools you can provide to an LLM, so you are forced to select which tools you want to utilize the most. — Read More