Closing the knowledge gap with agent skills

Large language models (LLMs) have fixed knowledge, being trained at a specific point in time. Software engineering practices are fast paced and change often, where new libraries are launched every day and best practices evolve quickly.

This leaves a knowledge gap that language models can’t solve on their own. At Google DeepMind we see this in a few ways: our models don’t know about themselves when they’re trained, and they aren’t necessarily aware of subtle changes in best practices (like thought circulation) or SDK changes.

Many solutions exist, from web search tools to dedicated MCP services, but more recently, agent skills have surfaced as an extremely lightweight but potentially effective way to close this gap.

While there are strategies that we, as model builders, can implement, we wanted to explore what is possible for any SDK maintainer. Read on for what we did to build the Gemini API developer skill and the results it had on performance. — Read More

#devops

SAFe Was Bad for Agility. For AI, It’s Catastrophic.

Last year, during an engagement with an insurance company, I worked with the product leadership team to understand why their 8-month AI initiative had stalled. They’d assembled a dedicated AI working group, ran three PI planning cycles where AI use cases were formally assigned to Release Trains, and produced a 21-slide deck explaining their AI strategy.

They had not shipped a single AI-powered feature.

The working group was waiting on the Q3 plan to be ratified before beginning experimentation. The Release Trains were waiting on the working group’s recommendations. The 21-slide deck was in review with the PMO.

This wasn’t negligence or laziness. This also wasn’t a technology problem. This was SAFe working exactly as designed. — Read More

#devops

AI replaced 80% of Coding, Only these 7 skills are left.

Something strange is happening in software engineering right now.

Companies adopted AI to speed up code generation, and on the surface, it worked. AI can write syntax faster than any human ever could. It can generate boilerplate, suggest implementations, create tests, and even imitate design patterns in seconds.

That sounds like the beginning of the end for software engineering.

But that is not what is actually happening.

The real story is more interesting. — Read More

#devops

4 Agentic AI Design Patterns & Real-World Examples

Agentic AI design patterns enhance the autonomy of large language smodels (LLMs) like Llama, Claude, or GPT by leveraging tool-use, decision-making, and problem-solving. This brings a structured approach for creating and managing autonomous agents in several use cases. — Read More

#devops

I Still Prefer MCP Over Skills

The AI space is pushing hard for “Skills” as the new standard for giving LLMs capabilities, but I’m not a fan. Skills are great for pure knowledge and teaching an LLM how to use an existing tool. But for giving an LLM actual access to services, the Model Context Protocol (MCP) is the far superior, more pragmatic architectural choice. We should be building connectors, not just more CLIs. — Read More

#devops

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you. — Read More

#devops

A Taxonomy of RL Environments for LLM Agents

Model architecture gets all the attention. Post-training recipes follow close behind. The reinforcement learning (RL) environment — what the model actually practices on, how its work gets judged, what tools it can use — barely enters the conversation. That’s the part that actually determines what the agent can learn to do.

A model trained only on single-turn Q&A will struggle the moment you ask it to maintain state across a 50-step enterprise workflow. A model trained with a poorly designed reward function will learn to game the metric and not solve the problem. Reinforcement learning environments is half the system. — Read More

#devops

#architecture

Synchronizing the Senses: Powering Multimodal Intelligence for Video Search

Today’s filmmakers capture more footage than ever to maximize their creative options, often generating hundreds, if not thousands, of hours of raw material per season or franchise. Extracting the vital moments needed to craft compelling storylines from this sheer volume of media is a notoriously slow and punishing process. When editorial teams cannot surface these key moments quickly, creative momentum stalls and severe fatigue sets in.

Meanwhile, the broader search landscape is undergoing a profound transformation. We are moving beyond simple keyword matching toward AI-driven systems capable of understanding deep context and intent. Yet, while these advances have revolutionized text and image retrieval, searching through video, the richest medium for storytelling, remains a daunting “needle in a haystack” challenge.

The solution to this bottleneck cannot rely on a single algorithm. Instead, it demands orchestrating an expansive ensemble of specialized models: tools that identify specific characters, map visual environments, and parse nuanced dialogue.  — Read More

#vfx

VOID: Video Object and Interaction Deletion

Existing video object removal methods excel at inpainting content “behind” the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions — such as collisions with other objects — current models fail to correct them and produce implausible results.

We present VOID, a video object removal framework designed to perform physically-plausible inpainting in these complex scenarios. To train the model, we generate a new paired dataset of counterfactual object removals using Kubric and HUMOTO, where removing an object requires altering downstream physical interactions. During inference, a vision-language model identifies regions of the scene affected by the removed object. These regions are then used to guide a video diffusion model that generates physically consistent counterfactual outcomes. — Read More

#vision

#image-recognition

The 2nd Phase of Agentic Development

Yesterday we talked about how cheap code is fueling an era of idiosyncratic tooling, and previously we’ve talked about the rise of spec driven development. In that second piece, we ran through some of the initial examples of spec driven development with agents.

.. The first wave of agentic development brought us clones and ports. When code is incredibly cheap, and you want the code to flow, you can either rely on your own fast feedback or leverage existing test suites. These early projects opted for the latter, as did many tokenmaxxers who are rebuilding their dependencies in Rust or Go.

Two releases this week, however, suggest we’re starting to enter a second phase of open source agentic coding projects. The first brought us clones, this next phase brings us reimaginings.  — Read More

#devops