By now, I’m sure you’ve heard that the Department of War has declared Anthropic a supply chain risk, because Anthropic refused to remove redlines around the use of their models for mass surveillance and for autonomous weapons.
Honestly I think this situation is a warning shot. Right now, LLMs are probably not being used in mission critical ways. But within 20 years, 99% of the workforce in the military, the government, and the private sector will be AIs. This includes the soldiers (by which I mean the robot armies), the superhumanly intelligent advisors and engineers, the police, you name it.
Our future civilization will run on AI labor. And as much as the government’s actions here piss me off, in a way I’m glad this episode happened – because it gives us the opportunity to think through some extremely important questions about who this future workforce will be accountable and aligned to, and who gets to determine that. — Read More
Recent Updates Page 15
How A Regular Person Can Utilize AI Agents
Let’s do this again, redux! I’ll explain how to use AI agents for easy language learning, to create an easier version of my morning briefing, and finally, a far easier version of my briefing transcription -> summary -> action pipeline. In the process, my goal is to help readers remix the general principles for their own (mostly safe) agents.
My last piece about AI agents was my most popular and widely shared article to date. Usually, one writes a “Part 1” that’s easier and a “Part 2” that’s more complex. This is the exact opposite.
… So, in this revisit, I have these goals:
— Explain the general principles of creating agents (more slowly)
— Use methods that are more accessible to non-technical users.
— Give a framework for remixing these methods for readers’ own ideas/agents.
Ironically, this piece took longer than my last one. Instead of just sharing my workflows, this piece is designed to let you use these agents with step-by-step instructions, from scratch, and have them adapted to you (not me). — Read More
The SaaSpocalypse: AI Agents, Vibe Coding, and the Changing Economics of SaaS
Over the past few months, a new phrase has been circulating across tech, venture capital, and public markets:
“The SaaSpocalypse.”
The narrative is straightforward, and a bit alarming for SaaS operators. What’s real and what’s clickbait?
We know this. AI agents are improving rapidly. Coding tools can generate entire applications. AI can automate workflows once performed inside SaaS products.
If software can now be generated on demand, the logic goes: why pay recurring subscriptions for SaaS at all? — Read More
Andrej Karpathy’s new open source ‘autoresearch’ lets you run hundreds of AI experiments a night — with revolutionary implications
Over the weekend, Andrej Karpathy—the influential former Tesla AI lead and co-founder and former member of OpenAI who coined the term “vibe coding”— posted on X about his new open source project, autoresearch.
It wasn’t a finished model or a massive corporate product: it was by his own admission a simple, 630-line script made available on Github under a permissive, enterprise-friendly MIT License. But the ambition was massive: automating the scientific method with AI agents while us humans sleep. — Read More
The Anthropic Shockwave: Why Claude Code Security Just Nuked Cybersecurity Stocks
The Dirty Secret of the SOC
Here is the nuclear option nobody in Silicon Valley wanted to talk about. For years, the cybersecurity industry has been a high stakes gambling ring built on a house of cards. You pay millions for “endpoint protection” and “zero trust” wrappers that essentially act as expensive digital duct tape. But what happens when the tape is no longer needed because the hole in the wall simply ceases to exist.
Anthropic just pressed the button.
On February 20, 2026, the AI industry stopped playing nice. With the launch of Claude Code Security, Anthropic didn’t just release another “assistant.” They released a predator. This isn’t the usual incremental update. This is a paradigm shift where the LLM moves from “writing buggy code” to “fixing bugs that have existed since the Clinton administration.” — Read More
Perplexity turns your Mac mini into a 24/7 AI agent
Two weeks after launching Perplexity Computer, a cloud-based AI agent that can orchestrate 20 frontier models to execute multi-step workflows autonomously, the company used its inaugural Ask 2026 developer conference in San Francisco on Wednesday to dramatically widen the platform’s reach
The centrepiece of announcement is Personal Computer: software that runs continuously on a user-supplied Mac mini, merging local files, apps, and sessions with Perplexity’s cloud-based Computer system. — Read More
The 8 Levels of Agentic Engineering
AI’s coding ability is outpacing our ability to wield it effectively. That’s why all the SWE-bench score maxxing isn’t syncing with the productivity metrics engineering leadership actually cares about. When Anthropic’s team ships a product like Cowork in 10 days and another team can’t move past a broken POC using the same models, the difference is that one team has closed the gap between capability and practice and the other hasn’t.
That gap doesn’t close overnight. It closes in levels. 8 of them. Most of you reading this are likely past the first few, and you should be eager to reach the next one because each subsequent level is a huge leap in output, and every improvement in model capability amplifies those gains further.
Level 1: Tab Complete
Level 2: Agent IDE
Level 3: Context Engineering
Level 4: Compounding Engineering
Level 5: MCP and Skills
Level 6: Harness Engineering
Level 7: Background Agents
Level 8: Autonomous Agent Teams
— Read More
Tilly Norwood | Take The Lead (Official Music Video)
Open Weights isn’t Open Training
When I was in college, my data structures professor told a story. It went something like this:
“When I was your age, I received an assignment, and encountered an inexplicable bug. I debugged and debugged and found that adding a print statement resolved the bug. I was young like all of you, and I was certain I’d found a bug in the C compiler. Turns out the problem was me.”
The takeaway was clear: if you have a bug, it’s your fault.
This is a good heuristic for most cases, but with open source ML infrastructure, you need to throw this advice out the window. There might be features that appear to be supported but are not. If you’re suspicious about an operation or stage that’s taking a long time, it may be implemented in a way that’s efficient enough…for an 8B model, not a 1T+ one. HuggingFace is good, but it’s not always correct. Libraries have dependencies, and problems can hide several layers down the stack. Even Pytorch isn’t ground truth.
Over the past couple months, I worked on developing infrastructure to post-train and serve models cheaply. Ultimately, my team decided to develop a custom training codebase, but only after I spent a few days attempting to use existing open-source options. The following is an account of my successes and failures and what it means for open-weights models. — Read More
How to scan for vulnerabilities with GitHub Security Lab’s open source AI-powered framework
For the last few months, we’ve been using the GitHub Security Lab Taskflow Agent along with a new set of auditing taskflows that specialize in finding web security vulnerabilities. They also turn out to be very successful at finding high-impact vulnerabilities in open source projects.
As security researchers, we’re used to losing time on possible vulnerabilities that turn out to be unexploitable, but with these new taskflows, we can now spend more of our time on manually verifying the results and sending out reports. Furthermore, the severity of the vulnerabilities that we’re reporting is uniformly high. Many of them are authorization bypasses or information disclosure vulnerabilities that allow one user to login as somebody else or to access the private data of another user.
Using these taskflows, we’ve reported more than 80 vulnerabilities so far. — Read More