Model-Harness-Fit

Is it best to use an LLM with its native harness (like Claude Code or Codex), or a generic harness that swaps models on demand?

… [I] decided to dig deeper by looking at the harness implementations of Codex, Claude Code, and Github sdk. Does the harness really matter that much?

… The hand wave answer is that “models behave differently because they are different models.” but here I tested the same models and different harness. — Read More

#devops

A Mental Model for Agentic Work

Something shifted in the first quarter of 2026. Not a feature launch, not a new product – a structural change in how work happens.

For the first time, I found myself genuinely operating with agents across every dimension of my work: personal tasks, software engineering, company operations. Not as a novelty. As the default mode.

This post is the abstraction I arrived at after weeks of doing this. A mental model that applies everywhere – because the architecture underneath is always the same. — Read More

#devops

Designing, Refining, and Maintaining Agent Skills at Perplexity

Perplexity’s frontier agent products rest on a foundation of know-how and domain expertise packaged in modular Agent Skills. We maintain a carefully curated library of Skills across our technical environments. These Skills include many of the general-purpose utilities powering Perplexity Computer; vertical-specific capabilities in areas such as finance, law, and health; and a very long tail of modules for addressing user needs. Some Skills are infrequently invoked but critical when invoked. To ensure a consistently excellent user experience, Perplexity’s Agents team prioritizes Skill quality just as much as code quality.

The intuitions and best practices required to develop a high-quality Skill differ significantly from those required to build traditional software. The Agents team reviews many pull requests from excellent engineers who develop Skills in the course of their work. The result is almost always numerous comments and suggestions for revision. This is because many useful patterns for writing code become antipatterns in Skill creation. — Read More

#devops

Agent Skills

The default behaviour of any AI coding agent is to take the shortest path to “done.” Ask for a feature and it writes the feature. It does not ask whether you have a spec, write a test before the implementation, consider whether the change crosses a trust boundary, or check what the PR will look like to a reviewer. It produces code, declares victory, and moves on.

This is the same failure mode every senior engineer has spent their career learning to avoid. The senior version of any task includes work that doesn’t show up in the diff: surfacing assumptions, writing the spec, breaking the work into reviewable chunks, choosing the boring design, leaving evidence that the result is correct, sizing the change so a human can actually review it. Those steps are most of what separates engineers who ship reliable software at scale from people who push code that breaks.

Agents skip those steps for the same reason any junior would. They’re invisible. The reward signal points at “task complete” not “task complete and the design doc exists.” So we have to bolt the senior-engineer scaffolding back on.

Agent Skills is my attempt at that scaffolding. It just crossed 26K stars, so apparently I’m not alone in wanting it. This post is the part the README doesn’t quite cover: why each design choice exists, how it maps onto standard SDLC and Google’s published engineering practices, and what you should steal from the project even if you never install a single skill. — Read More

#devops

AI-Assisted Coding: A Practical Guide for Software Engineers

…This is Part 1 of a two-part series. This guide covers everything you need as an individual developer: how AI code generation actually works under the hood, how to manage its limitations, how to write prompts that produce usable code, where AI genuinely helps, and where it will burn you if you’re not careful. — Read More

In Part 2 we’ll zoom out to the team and organizational level: how to measure whether AI-assisted velocity is sustainable, the specific categories of technical debt AI introduces, how to actually implement this at team scale, and the structural challenges the industry hasn’t solved yet. — Read More

#devops

Review AI-generated code

Reviewing code generated by AI tools like GitHub Copilot, ChatGPT, or other agents is becoming an essential part of the modern developer workflow. This guide provides practical techniques, emphasizes the importance of human oversight and testing, and includes example prompts to showcase how AI can assist in the review process.

For both legacy codebases and larger pull requests in particular, a thorough review process is critical. Combining human expertise with automated tools can ensure that AI-generated code meets quality standards, aligns with project goals, and adheres to best practices.

With Copilot, you can streamline your review process and enhance your ability to identify potential issues in AI-generated code. — Read More

#devops

Terraform Audit Guide: Monitoring, Logging & Compliance

Running an audit on your Terraform code enables you to systematically review your IaC code and determine whether your infrastructure respects your organization’s compliance and governance standards.

In this article, we walk through a Terraform audit, what can/can’t be learned from Terraform’s state file, how to run a Terraform audit step by step, what are the most popular Terraform audit tools, and the best practices around Terraform audits. — Read More

#devops

Salesforce Headless 360: Wrapping My Head Around It — Part 1

So, Salesforce did a thing to announce Salesforce Headless 360 at TDX last week and I’ve been wrapping my ‘head’ around it since then.

So, what exactly is Salesforce Headless?

Salesforce defines it as ‘Everything on Salesforce is now an API, MCP tool, or CLI command, and agents can use all of it.’

There’s also a fairly bold punch line to go with it — ‘No Browser Required’ — Read More

#devops

Structured-Prompt-Driven Development (SPDD)

LLM programming assistants have demonstrated considerable value, but mostly with individual developers. The internal IT organization in Thoughtworks has been using them for their teams and have developed a method and workflow called Structured Prompt-Driven Development (SPDD). The article describes a simple example of this workflow with details in github. This workflow treats the prompts as a first-class artifact, kept with the code in version control, and used to align development with business needs. We have found that developers need three key skills to be effective: alignment, abstraction-first, and iterative review. — Read More

#devops

Flow generation through natural language: An agentic modeling approach

If you’re building AI products on top of closed models, anyone with an API key can get similar capabilities. Lasting differentiation comes from proprietary data, the training recipe, the infrastructure, and the speed of iteration.

Shopify has something most companies don’t: a product surface where millions of merchant interactions directly signal whether the model’s output is any good. That feedback loop is the foundation, but only if you keep learning from it.

We fine-tuned a tool-calling agent to turn natural language into a Shopify Flow for Sidekick, our AI commerce assistant. It’s 2.2x faster, 68% cheaper, and outperforms closed models. — Read More

#devops