Building Software Is Learning

An internal note to the Amp team on feedback and shipping faster

A few weeks ago I shared the following as an internal message with the Amp team. I showed it to a friend while talking about feedback loops and he told me to post this publicly. So here we go. Unedited, straight up copy & pasted from our Slack.Read More

#devops

Rethinking Search as Code Generation

Search is a core primitive for AI systems. Frontier models grow more capable by the month, but they still need access to fresh, accurate, and well-curated knowledge from the wider world. Search is the primary way that AI systems tap into that knowledge, and thus a foundational component of any product that needs to draw conclusions, take actions, and perform real-world work.

We believe that traditional search pipelines are increasingly outdated in the era of agents. Traditional search answers queries, but today’s agents complete tasks that can take on countless shapes. These tasks require agents to define task-specific retrieval strategies directly within their harnesses. Within Perplexity Computer, we’ve seen single tasks invoke hundreds or even thousands of retrieval operations within a few minutes: a workflow that is impossible for humans but absolutely natural for agents.

In this world, search itself must become agentic, with its building blocks accessible directly as SDKs within the agent harness. We are introducing Search as Code (SaC) as Perplexity’s new reference search architecture. — Read More

#devops

Code Is Not Cheap: How to Multiply Your AI’s Output With Software Fundamentals

In February 2025, Andrej Karpathy coined “vibe coding”: describe what you want, let AI write the code, forget the code exists. It caught fire. Everyone wanted to believe coding had become as easy as talking.

One year later, Karpathy renamed it. The new term: “agentic engineering.” His explanation was pointed. “‘Engineering’ to emphasize that there is an art and science and expertise to it.” He’d gone from 80% manual coding to 80% agent coding in weeks, and discovered the hard way that models are “jagged” — brilliant at hard problems, then tripping over the obvious.

The data backs him up. GitClear’s 2025 code quality study found that AI-coauthored pull requests have 1.7x more issues than human-only PRs. Copy-pasted code lines rose from 8.3% to 12.3% between 2021 and 2024. Meanwhile, AI now writes 41% of all code on GitHub, with 4.7 million paid Copilot subscribers. — Read More

#devops

What Is the Best Local LLM for Coding in 2026?

We’ve all gone through the process of trying to run a multi-billion parameter model on our local machines. You spend the time downloading the weights and loading them into memory, only to have your machine freeze up completely when you actually try to prompt it. It usually ends with some broken output, and the realization that it’s just easier to stick to API keys.

I think the best local coding model is not the one with the highest math score. It is the one your machine can actually run without freezing. It is the tool that fits your specific daily workflow and respects your exact tolerance for latency. — Read More.

#devops

Macro Evals for Agentic Systems

When an agentic system fails, the problem is often larger than a single bad response. A handoff may happen too late, a specialist agent may miss the same signal across many runs, or a review process may trigger for the wrong class of cases. To improve the system, teams need to see recurring behavior across the whole population of traces.

This cookbook walks through a macro-eval workflow for a multi-agent system. We use a synthetic EV order workflow where specialist agents handle pricing, compliance, supply, factory routing, scheduling, and release decisions while market and operational conditions change.

The notebook uses precomputed synthetic traces and saved lower-level eval labels, so you can run the full workflow without an OpenAI API key. — Read More

#devops

What’s Easy Now? What’s Hard Now?

This is the fourth in a series about how AI is changing software development, after It’s time to be right.What about juniors?, and My heuristics are wrong. What now?. It stands alone, but if you found this interesting you may also find those interesting.

I’ve been spending a lot of time thinking about the shape of the capabilities of coding agents. What they’re good at now, what they’re going to be good at. What they’re bad at now, how much of that is inherent and how much is transient. This is worth thinking about, because it’s the most important question shaping the future of software, and of software engineering. I don’t pretend to have an answer, but am coming to a conclusion that may be deeply counter-intuitive.

Coding agents are becoming very good indeed, and can build meaningful and correct software very quickly and at transformatively low cost. They have super-human abilities on some coding tasks. Of course, computer systems have had super human abilities for at least 85 years1. I think we’re going to find, as we have over those nine decades, that this new technology we’re building is vastly super-human in some areas2, and not nearly as capable as humans in others. — Read More

#devops

Terraform Enterprise 2.0: Evolving infrastructure operations for scale

At the core of Terraform Enterprise 2.0 is support for Stacks, a new infrastructure orchestration capability that allows teams to manage collections of infrastructure as a single unit. Terraform Stacks are available on all plans based on resources under management.

As organizations scale, infrastructure evolves from isolated configurations into systems of interconnected components. Stacks reflect this shift by introducing a configuration layer that enables teams to define and manage infrastructure across environments, regions, and accounts in a consistent, repeatable way.  — Read More

#devops

Spec-Driven Development Isn’t Broken. It will collapse.

“prompting has split into four skills” — Context, Intent, Specification, Prompt. Everyone matched a tension one of us had brought into the room. And once they had names, something else clicked: the four crafts mapped cleanly onto P-CAM — Perception, Cognition, Agency, Manifestation.

…For the last eight months, the argument has been spec versus vibe. Structure versus flow. Waterfall versus emergence.

…Every standard critique of SDD, and every standard critique of vibe, traces back to the same thing. Not two sets of failures. One failure, surfacing on both sides of the debate. The three-layer collapse.

…Vibe coding collapsed because it had no contract. Spec-driven development is collapsing because it has three contracts pretending to be one. What rises from the fusion isn’t a new brand. It isn’t a better tool. It’s a separation of concerns — the oldest principle in software engineering — applied one layer up, to the documents we use to instruct the machines that write the documents. — Read More

#devops

Beyond the Coding Assistant: A Series on AI-Assisted Software Engineering

This is the first article of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available here

The last few years of AI-assisted development have been remarkable. Coding assistants have crossed real quality bars. Engineers can now produce working code, in unfamiliar languages, against unfamiliar systems, at speeds that would have looked like science fiction in 2022. There are real productivity gains, real new affordances, and a real shift in what an individual developer can do in an afternoon.

And yet — when the conversation turns to the team and the organization — the picture is more complicated. The dramatic gains many leaders were promised haven’t shown up on every team. Some teams ship more. Some teams ship the same. Some teams have actually gotten slower, with the AI helping at the keystroke while the wider delivery metrics regress.

That gap, between what’s possible at the keystroke and what’s actually showing up in delivery, is what this series is about. The question I want to ask, and try to answer over the next several articles, is simple: what has changed, and what changes could take us so much farther than where current AI coding assistants have brought us? — Read More

#devops

How Claude Code works in large codebases: Best practices and where to start

Claude Code is running in production across multi-million-line monorepos, decades-old legacy systems, distributed architectures spanning dozens of repositories, and at organizations with thousands of developers. These environments present challenges that smaller, simpler codebases don’t, whether that’s build commands that differ across every subdirectory or legacy code spread across folders with no shared root.

This article covers the patterns we’ve observed that have led to successful adoption of Claude Code at scale. We use “large codebase” to refer to a wide range of deployments: monorepos with millions of lines, legacy systems built over decades, dozens of microservices across separate repositories, or any combination of the above. That also includes codebases running on languages that teams don’t always associate with AI coding tools, such as C, C++, C#, Java, PHP. (Claude Code performs better than most teams expect it to in those cases, particularly as of recent model releases.) While every large codebase deployment is shaped by its specific version control, team structure, and accumulated conventions, the patterns here generalize across them and are a good starting point for teams considering adopting Claude Code. — Read More

#devops