Claude Managed Agents: get to production 10x faster

Today, we’re launching Claude Managed Agents, a suite of composable APIs for building and deploying cloud-hosted agents at scale.

Until now, building agents meant spending development cycles on secure infrastructure, state management, permissioning, and reworking your agent loops for every model upgrade. Managed Agents pairs an agent harness tuned for performance with production infrastructure to go from prototype to launch in days rather than months.

Whether you’re building single-task runners or complex multi-agent pipelines, you can focus on the user experience, not the operational overhead. — Read More

#devops

Spec-Driven Development Is Waterfall in Markdown

SpecKit has 77,000 GitHub stars. AWS built an entire IDE around spec-driven development. Tessl raised $125 million on the promise that specs, not code, should be the source of truth.

The pitch was clean: stop vibe coding, write a proper specification, let the agent execute against it. Engineers loved it. It felt like rigor. It felt like the adults had finally entered the room.

Then someone actually tested it on a real project. Ten times slower. More ceremony. Same bugs.

The industry built an entire ecosystem around one idea: if we give AI agents a detailed enough spec, they’ll produce working software. It’s the same bet the industry made with outsourcing, with offshoring, with every model that tries to replace understanding with documentation. Write it down clearly enough and someone (or something) on the other side will execute it perfectly. —  Read More

#devops

Closing the knowledge gap with agent skills

Large language models (LLMs) have fixed knowledge, being trained at a specific point in time. Software engineering practices are fast paced and change often, where new libraries are launched every day and best practices evolve quickly.

This leaves a knowledge gap that language models can’t solve on their own. At Google DeepMind we see this in a few ways: our models don’t know about themselves when they’re trained, and they aren’t necessarily aware of subtle changes in best practices (like thought circulation) or SDK changes.

Many solutions exist, from web search tools to dedicated MCP services, but more recently, agent skills have surfaced as an extremely lightweight but potentially effective way to close this gap.

While there are strategies that we, as model builders, can implement, we wanted to explore what is possible for any SDK maintainer. Read on for what we did to build the Gemini API developer skill and the results it had on performance. — Read More

#devops

SAFe Was Bad for Agility. For AI, It’s Catastrophic.

Last year, during an engagement with an insurance company, I worked with the product leadership team to understand why their 8-month AI initiative had stalled. They’d assembled a dedicated AI working group, ran three PI planning cycles where AI use cases were formally assigned to Release Trains, and produced a 21-slide deck explaining their AI strategy.

They had not shipped a single AI-powered feature.

The working group was waiting on the Q3 plan to be ratified before beginning experimentation. The Release Trains were waiting on the working group’s recommendations. The 21-slide deck was in review with the PMO.

This wasn’t negligence or laziness. This also wasn’t a technology problem. This was SAFe working exactly as designed. — Read More

#devops

AI replaced 80% of Coding, Only these 7 skills are left.

Something strange is happening in software engineering right now.

Companies adopted AI to speed up code generation, and on the surface, it worked. AI can write syntax faster than any human ever could. It can generate boilerplate, suggest implementations, create tests, and even imitate design patterns in seconds.

That sounds like the beginning of the end for software engineering.

But that is not what is actually happening.

The real story is more interesting. — Read More

#devops

4 Agentic AI Design Patterns & Real-World Examples

Agentic AI design patterns enhance the autonomy of large language smodels (LLMs) like Llama, Claude, or GPT by leveraging tool-use, decision-making, and problem-solving. This brings a structured approach for creating and managing autonomous agents in several use cases. — Read More

#devops

I Still Prefer MCP Over Skills

The AI space is pushing hard for “Skills” as the new standard for giving LLMs capabilities, but I’m not a fan. Skills are great for pure knowledge and teaching an LLM how to use an existing tool. But for giving an LLM actual access to services, the Model Context Protocol (MCP) is the far superior, more pragmatic architectural choice. We should be building connectors, not just more CLIs. — Read More

#devops

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you. — Read More

#devops

A Taxonomy of RL Environments for LLM Agents

Model architecture gets all the attention. Post-training recipes follow close behind. The reinforcement learning (RL) environment — what the model actually practices on, how its work gets judged, what tools it can use — barely enters the conversation. That’s the part that actually determines what the agent can learn to do.

A model trained only on single-turn Q&A will struggle the moment you ask it to maintain state across a 50-step enterprise workflow. A model trained with a poorly designed reward function will learn to game the metric and not solve the problem. Reinforcement learning environments is half the system. — Read More

#devops

#architecture

The 2nd Phase of Agentic Development

Yesterday we talked about how cheap code is fueling an era of idiosyncratic tooling, and previously we’ve talked about the rise of spec driven development. In that second piece, we ran through some of the initial examples of spec driven development with agents.

.. The first wave of agentic development brought us clones and ports. When code is incredibly cheap, and you want the code to flow, you can either rely on your own fast feedback or leverage existing test suites. These early projects opted for the latter, as did many tokenmaxxers who are rebuilding their dependencies in Rust or Go.

Two releases this week, however, suggest we’re starting to enter a second phase of open source agentic coding projects. The first brought us clones, this next phase brings us reimaginings.  — Read More

#devops