The offensive capabilities of large language models (LLMs) have until recently existed as theoretical risks – frequently discussed at security conferences and in conceptual industry reports, but rarely discovered in practical exploits. However, in November 2025, Anthropic published a pivotal report documenting a state-sponsored espionage campaign. In this operation, AI didn’t just assist human operators – it became the operator, performing 80-90% of the campaign autonomously, at speeds that no human team could match.
This disclosure shifted the conversation from “could this happen?” to “this is happening.” But it also raised practical questions: Can AI actually operate autonomously end-to-end, or does it still require human guidance at each decision point? Where do current LLM capabilities excel, and where do they fall short compared to skilled human operators?
To answer these questions, we built a multi-agent penetration testing proof of concept (PoC), designed to empirically test autonomous AI offensive capabilities against cloud environments. — Read More
Daily Archives: April 24, 2026
A Hundred Robots Are Running A Bio Lab
The small robot has brushed past me five times in the last hour.
It runs loops around the perimeter of the third floor of this bio lab, serving as a courier. The machine’s job is to visit workstations and keep other robots – arms bolted to lab benches – fed with whatever they need be it pipette holders, sealed plates or something in a labeled bag. The little bot is relentless and unconcerned about me or much else beyond its job. Out of the corner of my eye, I spot chairs still rotating slowly on their bases from where it clipped them on the last pass.
About a hundred robotic arms fill this room, each one positioned beside a different scientific tool. The arms must deal with centrifuges, incubators, chambers and tubes. They run simultaneously and continuously. The small robot links them together, ferrying consumables between stations the way a junior scientist carries things between benches. Except the benches are robots. And so is the assistant. — Read More
From Vibe Coder to Product Builder
The lines between product management and software engineering are becoming increasingly blurred. As product managers, we can now show rather than tell; build rather than write. There’s a spectrum here.
… A lot of product managers stop at Bolt or Lovable – and that’s fine for visualising an idea. But I believe there’s a meaningful difference between visualising a product and actually building one. My take is that there are different degrees of product building, and if you want to move from prototyping ideas to shipping real products, you need to start using coding agents and get comfortable with some engineering basics. Not to become an engineer, but to get the most out of the tools. — Read More
The AI Chasm
Every week I see another LinkedIn post about how AI is going to transform everything. Another “X is dead” announcement and someone shipping their latest vibe coded project.
And don’t get me wrong. I get it. The hype is real.
My head isn’t buried in the sand. It doesn’t have the same Amazon Alexa and NFT vibes. This is more like the internet or mobile phones for sure.
But think about how long both of those took to take off?
… I want to give you a different perspective to the AI rhetoric.
A more balanced view that you might disagree with – claiming “but this time it’s different’ – or agree with.
Either way I want to hopefully give you a different perspective on everything that is happening right now. One that’s based in research and what’s historically happened before. — Read More
A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all.
We pulled dozens of AGENTS.md files from across our monorepo and measured their effect on code generation. The best ones gave our coding agent a quality jump equivalent to upgrading from Haiku to Opus. The worst ones made the output worse than having no AGENTS.md at all.
That gap was surprising enough that we built a systematic study around it.
What we found: most of what people put in AGENTS.md either doesn’t help or actively hurts, and the patterns that work are specific and learnable. — Read More