The Capability Maturity Model for AI in Design

Matt Davey, who is Chief Experience Officer at 1Password, created a useful capability maturity model for AI in design. His original model has 5 levels (Limited, Reactive, Developing, Embedded, and Leading), each of which differs along 6 characteristics (Leadership on AI, Strategy & Budgeting, AI Culture & Talent, AI Learning & Enablement, AI Agents & Automation, and AI Product Design). Thus, the model covers both the use of AI within the design process and the use of AI in the resulting product. I recommend you read the full thing, but here is a summary of Davey’s 5 capability maturity levels for AI in design.

As discussed below, I added Maturity Level 6, Symbiotic, for a more complete capability maturity ladder.

For a summary of this article, watch my short overview explainer video (YouTube, 6 min.). — Read More

#devops, #vfx

You Need to Rewrite Your CLI for AI Agents

I built a CLI for Google Workspace — agents first. Not “built a CLI, then noticed agents were using it.” From Day One, the design assumptions were shaped by the fact that AI agents would be the primary consumers of every command, every flag, and every byte of output.

CLIs are increasingly the lowest-friction interface for AI agents to reach external systems. Agents don’t need GUIs. They need deterministic, machine-readable output, self-describing schemas they can introspect at runtime, and safety rails against their own hallucinations. — Read More

#devops

Not Prompts, Blueprints

I hate to micromanage & I’ve been micromanaging AI.

A few months ago, I’d use Claude for a familiar workflow : capturing notes from a meeting, drafting a follow-up email, updating the CRM, writing the investment memo. Micromanagement at 10x speed. The agent would finish a step, then wait. I’d scan the output, type the next instruction, wait again. Prompt, response, prompt, response. I was the bottleneck in my own system.

A year ago, this was necessary. The models couldn’t hold a complex task in their heads. Now they can.

But this leverage requires planning. Now I sketch the workflow before I touch the machine.  — Read More

#devops

MCP is dead. Long live the CLI

I’m going to make a bold claim: MCP is already dying. We may not fully realize it yet, but the signs are there. OpenClaw doesn’t support it. Pi doesn’t support it. And for good reason.

When Anthropic announced the Model Context Protocol, the industry collectively lost its mind. Every company scrambled to ship MCP servers as proof they were “AI first.” Massive resources poured into new endpoints, new wire formats, new authorization schemes, all so LLMs could talk to services they could already talk to.

I’ll admit, I never fully understood the need for it. You know what LLMs are really good at? Figuring things out on their own. Give them a CLI and some docs and they’re off to the races.

I tried to avoid writing this for a long time, but I’m convinced MCP provides no real-world benefit, and that we’d be better off without it. Let me explain. — Read More

#devops

The third era of AI software development

When we started building Cursor a few years ago, most code was written one keystroke at a time. Tab autocomplete changed that and opened the first era of AI-assisted coding.

Then agents arrived, and developers shifted to directing agents through synchronous prompt-and-response loops. That was the second era. Now a third era is arriving. It is defined by agents that can tackle larger tasks independently, over longer timescales, with less human direction.

As a result, Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work.

Many of us at Cursor are already working this way. More than one-third of the PRs we merge are now created by agents that run on their own computers in the cloud. A year from now, we think the vast majority of development work will be done by these kinds of agents. — Read More

#devops

OpenAI: Realtime Prompting Guide

Today, we’re releasing gpt-realtime — our most capable speech-to-speech model yet in the API and announcing the general availability of the Realtime API.

Speech-to-speech systems are essential for enabling voice as a core AI interface. The new release enhances robustness and usability, giving enterprises the confidence to deploy mission-critical voice agents at scale.

The new gpt-realtime model delivers stronger instruction following, more reliable tool calling, noticeably better voice quality, and an overall smoother feel. These gains make it practical to move from chained approaches to true realtime experiences, cutting latency and producing responses that sound more natural and expressive. — Read More

#devops

The End of CI/CD Pipelines: The Dawn of Agentic DevOps

I’ve been staring at Jenkins configs for the better part of a decade. YAML indentation errors at 2 AM. Flaky integration tests that pass locally, fail in CI, pass again when you rerun them. The entire apparatus of modern continuous integration—the build servers, the artifact registries, the deployment scripts marching in lockstep—it works, mostly, until it doesn’t. And when it fails, you’re the one who has to figure out which of seventeen microservices decided to timeout during health checks this time.

So when someone tells me we’re entering the era of “agentic DevOps,” where AI agents will automate, optimize, and self-heal our delivery pipelines, my first instinct isn’t excitement. It’s pattern recognition. I’ve heard this song before—infrastructure-as-code would solve everything, GitOps would eliminate configuration drift, service mesh would make networking trivial. Each wave delivered genuine value. Each also brought new failure modes we hadn’t anticipated.

But this one feels different. Not because the marketing promises are more extravagant—they always are—but because the underlying mechanism has actually changed. We’re not just automating what humans already scripted. We’re delegating judgment. — Read More

#devops

Tests Are The New Moat

Open source projects grow over time. They are a product of incremental development. A project starts lean, gains adoption, pivots to accommodate that adoption, and maintains backwards compatibility throughout this process.

These lean projects become large ships. Historically, this has been the great power of open source. But what inevitably happens is the infrastructure that you build on becomes outdated. You try to Theseus your way out of it, rebuilding layers of your project on more modern foundations, but it can be hard to reorient your ship in the wake of its own velocity.

This has resulted in two forms of change: forks and total rewrites. You take the foundation that someone else built and you diverge paths. Or you take their contracts (like an API surface), and rewrite it on more modern, stable ground. Examples of this are S3-compatible APIs which are now commonplace, or something like redpanda–a kafka-compatible total-rewrite. — Read More

#devops

Agents are not thinking, they are searching

More than ten years ago, we were barely able to recognize cats with DL (deep learning) and today we have bots forming religions. I don’t like anthropomorphizing models, but I rather like seeing them as a utility that can be used in interesting ways. But we live in a strange timeline:

— The DOW is over 50000. The number’s only been going up since the launch of ChatGPT.

— An open-source agent framework called OpenClaw goes viral. One of its agents — “crabby-rathbun” — opens PR #31132 to matplotlib, gets rejected by maintainer Scott Shambaugh, and autonomously publishes a hit piece on him that goes viral.

— All of this is happening at the same time as Anthropic releasing case studies about running agents that build compilers. They did use GCC torture test suite as a good verifier, but it is an extremely impressive achievement nonetheless.

This very quick progress has also created a lot of mysticism around AI. For this reason, I felt it would be an interesting exercise to de-anthropomorphize AI agents for the tools that they are. If we want to use these technologies for longer time horizon tasks, we need a frame of thinking that allows an engineering mindset to flourish instead of an alchemic one. — Read More

#devops

How I Use Claude Code

I’ve been using Claude Code as my primary development tool for approx 9 months, and the workflow I’ve settled into is radically different from what most people do with AI coding tools. Most developers type a prompt, sometimes use plan mode, fix the errors, repeat. The more terminally online are stitching together ralph loops, mcps, gas towns (remember those?), etc. The results in both cases are a mess that completely falls apart for anything non-trivial.

The workflow I’m going to describe has one core principle: never let Claude write code until you’ve reviewed and approved a written plan. This separation of planning and execution is the single most important thing I do. It prevents wasted effort, keeps me in control of architecture decisions, and produces significantly better results with minimal token usage than jumping straight to code. — Read More

#devops