Last week, I was talking to a PM who’d in the recent months shipped their AI agent. The metrics looked great: 89% accuracy, sub-second respond times, positive user feedback in surveys. But users were abandoning the agent after their first real problem, like a user with both a billing dispute and a locked account.
“Our agent could handle routine requests perfectly, but when faced with complex issues, users would try once, get frustrated, and immediately ask for a human.”
This pattern is observed across every product team that focuses on making their agents “smarter” when the real challenge is making architectural decisions that shape how users experience and begin to trust the agent. — Read More
Tag Archives: DevOps
Context Engineering Series: Building Better Agentic RAG Systems
We’ve moved far beyond prompt engineering. Now we’re designing portfolios of tools (directory listing, file editing, web search), slash commands like /pr-create that inject prompts vs , specialized sub-agents @pr-creation-agent, vs having an AGENT.md with systems that work across IDEs, command lines, GitHub, and Slack.
Context engineering is designing tool responses and interaction patterns that give agents situational awareness to navigate complex information spaces effectively.
To understand what this means practically, let’s look at how systems have evolved:
Before: We precomputed what chunks needed to be put into context, injected them, and then asked the system to reason about the chunks.
… Now: Agents are incredibly easy to build because all you need is a messages array and a bunch of tools. — Read More
Exploring Foundation Models’ Tool-Use Efficacy
Model Context Protocol (MCP) is an open-source framework launched by Anthropic to standardize the way LLMs use external tools. AI agents use MCP to enable multi-turn workflows, where an LLM (via products like Claude Desktop or Cursor) can select and coordinate between tools in multiple MCP servers. Since its introduction, MCP has quickly become the de facto standard for tool integrations with LLMs.
There are now thousands of official and unofficial MCP servers, each with dozens of tools! While more MCP choices are great for the tool integration ecosystem, sometimes having too many options is a curse. Products like Cursor often limit how many tools you can provide to an LLM, so you are forced to select which tools you want to utilize the most. — Read More
It’s not 10x. It’s 36x – this is what it looks like to kill a $30k meeting with AI
I killed our weekly triage meeting last month. Three hours compressed to five minutes. But here’s the thing—it took me six failed attempts to get there.
The breakthrough wasn’t making the AI smarter. It was making the task more structured. This is what context engineering actually looks like—messy, iterative, and focused on constraints rather than capabilities.
Let me show you what it really takes to achieve a 36x productivity gain with AI. Spoiler: it’s not about the AI at all. — Read More
AI and Secure Code Generation
At the end of 2024, 25 percent of new code at Google was being written not by humans, but by generative large language models (LLMs)—a practice known as “vibe coding.” While the name may sound silly, vibe coding is a tectonic shift in the way software is built. Indeed, the quality of LLMs themselves is improving at a rapid pace in every dimension we can measure—and many we can’t. This rapid automation is transforming software engineering on two fronts simultaneously: Artificial intelligence (AI) is not only writing new code; it is also beginning to analyze, debug, and reason about existing human-written code.
As a result, traditional ways of evaluating security—counting bugs, reviewing code, and tracing human intent—are becoming obsolete. AI experts no longer know if AI-generated code is safer, riskier, or simply vulnerable in different ways than human-written code. We must ask: Do AIs write code with more bugs, fewer bugs, or entirely new categories of bugs? And can AIs reliably discover vulnerabilities in legacy code that human reviewers miss—or overlook flaws humans find obvious? Whatever the answer, AI will never again be as inexperienced at code security analysis as it is today. And as is typical with information security, we are leaping into the future without useful metrics to measure position or velocity. — Read More
No Code Is Dead
Once again, the software development landscape is experiencing another big shift. After years of drag-and-drop, no-code platforms democratizing app creation, generative AI (GenAI) is eliminating the need for no-code platforms in many cases.
Mind you, I said “no code” not “low code” — there are key differences. (More on this later.)
GenAI has introduced the ability for nontechnical users to use natural language to build apps just by telling the system what they want done. Call it “vibe coding” — the ability to describe what you want and watch AI generate working applications, or whatever. But will this new paradigm enhance existing no-code tools or render them obsolete?
I sought out insights from industry veterans to explore this pivotal question, revealing a broad spectrum of perspectives on where the intersection of AI and visual development is heading. — Read More
The hidden cost of AI reliance
I want to be clear: I’m a software engineer who uses LLMs ‘heavily’ in my daily work. They have undeniably been a good productivity tool, helping me solve problems and tackle projects faster. This post isn’t about how we should reject LLMs and progress but rather my reflection on what we might be losing in our haste to embrace them.
The rise of AI coding assistants has brought in what many call a new age of productivity. LLMs excel at several key areas that genuinely improve developer workflows: writing isolated functions; scaffolding boilerplate code like test cases, configuration files, explaining unfamiliar code or complex algorithms, generating documentation and comments, and helping with syntax in unfamiliar languages or frameworks. These capabilities allow us to work ‘faster’.
But beneath this image of enhanced efficiency, I find myself wondering if there’s a more troubling affect: Are we trading our hard-earned intelligence for short-term convenience? — Read More
Introducing warmwind OS : The AI Operating System That Works Smarter
What if your operating system didn’t just run your computer but actively worked alongside you, anticipating your needs, learning your habits, and automating your most tedious tasks? Enter warmwind OS, the world’s first AI-driven operating system, a bold leap into the future of human-computer interaction. Unlike traditional systems that passively wait for your commands, warmwind OS operates as a proactive partner, seamlessly blending into your workflows to eliminate inefficiencies and free up your time for what truly matters. Imagine an OS that not only understands your goals but actively helps you achieve them—this isn’t science fiction; it’s here.
In this introduction of warmwind OS, its development team explain how it redefines the relationship between humans and technology. From its new teaching mode that allows the AI to learn directly from your actions to its ability to integrate with even the most outdated legacy software, this operating system is designed to adapt to your unique needs. Whether you’re looking to streamline customer support, enhance HR processes, or simply reclaim hours lost to repetitive tasks, warmwind OS offers a glimpse into a smarter, more intuitive future. As you read on, consider this: what could you achieve if your technology worked as hard as you do? — Read More
LiteLLM
LiteLLM is a LLM gateway that lets you Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]. Support for missing a providers or LLM Platform can be requested via a feature request. — Read More
What can agents actually do?
There’s a lot of excitement about what AI (specifically the latest wave of LLM-anchored AI) can do, and how AI-first companies are different from the prior generations of companies. There are a lot of important and real opportunities at hand, but I find that many of these conversations occur at such an abstract altitude that they border on meaningless. Sort of like saying that your company could be much better if you merely adopted more software. That’s certainly true, but it’s not a particularly helpful claim.
This post is an attempt to concisely summarize how AI agents work, apply that summary to a handful of real-world use cases for AI, and to generally make the case that agents are a multiplier on the quality of your software and system design. If your software or systems are poorly designed, agents will only cause harm. If there’s any meaningful definition of an AI-first company, it must be companies where their software and systems are designed with an immaculate attention to detail.
By the end of this writeup, my hope is that you’ll be well-armed to have a concrete discussion about how LLMs and agents could change the shape of your company, and to avoid getting caught up in the needlessly abstract discussions that are often taking place today. — Read More