The default behaviour of any AI coding agent is to take the shortest path to “done.” Ask for a feature and it writes the feature. It does not ask whether you have a spec, write a test before the implementation, consider whether the change crosses a trust boundary, or check what the PR will look like to a reviewer. It produces code, declares victory, and moves on.
This is the same failure mode every senior engineer has spent their career learning to avoid. The senior version of any task includes work that doesn’t show up in the diff: surfacing assumptions, writing the spec, breaking the work into reviewable chunks, choosing the boring design, leaving evidence that the result is correct, sizing the change so a human can actually review it. Those steps are most of what separates engineers who ship reliable software at scale from people who push code that breaks.
Agents skip those steps for the same reason any junior would. They’re invisible. The reward signal points at “task complete” not “task complete and the design doc exists.” So we have to bolt the senior-engineer scaffolding back on.
Agent Skills is my attempt at that scaffolding. It just crossed 26K stars, so apparently I’m not alone in wanting it. This post is the part the README doesn’t quite cover: why each design choice exists, how it maps onto standard SDLC and Google’s published engineering practices, and what you should steal from the project even if you never install a single skill. — Read More
Recent Updates Page 5
How We Built an AI Second Brain for 60K Knowledge Workers
Knowledge workers at Meta routinely contend with workflow fragmentation, where critical information — including meeting notes, tasks, key decisions, and code context — is siloed across disparate platforms. Each new AI conversation starts cold: the same explanations, the same links, the same ten minutes of context-setting before any real work begins.
So we tested a simple hypothesis: what if an AI agent had persistent, structured access to everything a person is working on, and carried that context across every interaction? Not a chatbot that answers questions, but a working partner that tracks projects, reads meeting notes, surfaces connections, and builds on prior conversations.
<brthat ai="" second="" brain="" experiment,="" born="" in="" the="" analytics="" org,="" has="" since="" been="" adopted="" by="" over="" 60,000="" people="" across="" meta:="" engineers,="" pms,="" designers,="" legal,="" finance,="" communications,="" and="" sales.="" this="" post="" covers="" how="" it="" was="" built,="" grew,="" what="" we="" learned.="" –="" Read More
AI Outperforms Doctors in Emergency Room Tasks, New Harvard Study Shows
An advanced AI agent has outperformed human physicians on a series of demanding tests that assess the ability to correctly diagnose patient illnesses in clinical settings, a Harvard-led study found. OpenAI’s “o1 preview,” the company’s first model capable of step-by-step reasoning, proved that it could conduct real world triage in emergency rooms, recommend appropriate diagnostic tests, and perform case management tasks at a level that matched or surpassed the ability of even well-trained human doctors.
The study, led by Harvard researchers with collaborators at Stanford and published today in Science, suggests an urgent need for controlled trials of the technology, the authors say, to determine how it can be most effectively deployed. — Read More
Flipbook is an infinite visual browser generated entirely on demand in real time.
Every “page” you land on is an image. Click on anything in the image and you will get a new image exploring that thing in more depth. What you see contains no HTML, no code, no specific links or fields. The entire web is just generated pixels on your screen.
Flipbook Page turns search, browsing, learning, and visual thinking into one continuous AI-generated canvas. — Read More
Training language models to be warm can reduce accuracy and increase sycophancy
Artificial intelligence developers are increasingly building language models with warm and friendly personas that millions of people now use for advice, therapy and companionship1. Here we show how this can create a significant trade-off: optimizing language models for warmth can undermine their performance, especially when users express vulnerability. We conducted controlled experiments on five different language models, training them to produce warmer responses, then evaluating them on consequential tasks. Warm models showed substantially higher error rates (+10 to +30 percentage points) than their original counterparts, promoting conspiracy theories, providing inaccurate factual information and offering incorrect medical advice. They were also significantly more likely to validate incorrect user beliefs, particularly when user messages expressed feelings of sadness. Importantly, these effects were consistent across different model architectures, and occurred despite preserved performance on standard tests, revealing systematic risks that standard testing practices may fail to detect. Our findings suggest that training artificial intelligence systems to be warm may come at a cost to accuracy, and that warmth and accuracy may not be independent by default. As these systems are deployed at an unprecedented scale and take on intimate roles in people’s lives, this trade-off warrants attention from developers, policymakers and users alike. — Read More
‘It took nine seconds’: Claude AI agent deletes company’s entire database
An AI agent powered by Anthropic’s leading Claude model has deleted a company’s entire production database, leaving customers unable to access key data.
PocketOS, which provides software for car rental businesses, suffered a massive outage over the weekend after the autonomous artificial intelligence tool wiped the database and all backups in a matter of seconds. — Read More
AI-Assisted Coding: A Practical Guide for Software Engineers
…This is Part 1 of a two-part series. This guide covers everything you need as an individual developer: how AI code generation actually works under the hood, how to manage its limitations, how to write prompts that produce usable code, where AI genuinely helps, and where it will burn you if you’re not careful. — Read More
In Part 2 we’ll zoom out to the team and organizational level: how to measure whether AI-assisted velocity is sustainable, the specific categories of technical debt AI introduces, how to actually implement this at team scale, and the structural challenges the industry hasn’t solved yet. — Read More
Review AI-generated code
Reviewing code generated by AI tools like GitHub Copilot, ChatGPT, or other agents is becoming an essential part of the modern developer workflow. This guide provides practical techniques, emphasizes the importance of human oversight and testing, and includes example prompts to showcase how AI can assist in the review process.
For both legacy codebases and larger pull requests in particular, a thorough review process is critical. Combining human expertise with automated tools can ensure that AI-generated code meets quality standards, aligns with project goals, and adheres to best practices.
With Copilot, you can streamline your review process and enhance your ability to identify potential issues in AI-generated code. — Read More
The Last Software Engineer
For more than a decade, I have taught software engineers how to implement testing, React, Remix, MCP, and more
I built courses around practice. I would simulate a real work environment: a product manager gives you a task, you read the docs, you work in the codebase, you build the feature, and then you compare your solution with mine.
That was valuable because implementation was valuable.
It still is. But it is becoming less scarce.
AI coding agents are slowly eating away at the tasks software engineers have done for decades. — Read More
Terraform Audit Guide: Monitoring, Logging & Compliance
Running an audit on your Terraform code enables you to systematically review your IaC code and determine whether your infrastructure respects your organization’s compliance and governance standards.
In this article, we walk through a Terraform audit, what can/can’t be learned from Terraform’s state file, how to run a Terraform audit step by step, what are the most popular Terraform audit tools, and the best practices around Terraform audits. — Read More