Can AI Attack the Cloud? Lessons From Building an Autonomous Cloud Offensive Multi-Agent System

The offensive capabilities of large language models (LLMs) have until recently existed as theoretical risks – frequently discussed at security conferences and in conceptual industry reports, but rarely discovered in practical exploits. However, in November 2025, Anthropic published a pivotal report documenting a state-sponsored espionage campaign. In this operation, AI didn’t just assist human operators – it became the operator, performing 80-90% of the campaign autonomously, at speeds that no human team could match.

This disclosure shifted the conversation from “could this happen?” to “this is happening.” But it also raised practical questions: Can AI actually operate autonomously end-to-end, or does it still require human guidance at each decision point? Where do current LLM capabilities excel, and where do they fall short compared to skilled human operators?

To answer these questions, we built a multi-agent penetration testing proof of concept (PoC), designed to empirically test autonomous AI offensive capabilities against cloud environments. — Read More

#cyber

A Hundred Robots Are Running A Bio Lab

The small robot has brushed past me five times in the last hour.

It runs loops around the perimeter of the third floor of this bio lab, serving as a courier. The machine’s job is to visit workstations and keep other robots – arms bolted to lab benches – fed with whatever they need be it pipette holders, sealed plates or something in a labeled bag. The little bot is relentless and unconcerned about me or much else beyond its job. Out of the corner of my eye, I spot chairs still rotating slowly on their bases from where it clipped them on the last pass.

About a hundred robotic arms fill this room, each one positioned beside a different scientific tool. The arms must deal with centrifuges, incubators, chambers and tubes. They run simultaneously and continuously. The small robot links them together, ferrying consumables between stations the way a junior scientist carries things between benches. Except the benches are robots. And so is the assistant. — Read More

#robotics

From Vibe Coder to Product Builder

The lines between product management and software engineering are becoming increasingly blurred. As product managers, we can now show rather than tell; build rather than write. There’s a spectrum here.

… A lot of product managers stop at Bolt or Lovable – and that’s fine for visualising an idea. But I believe there’s a meaningful difference between visualising a product and actually building one. My take is that there are different degrees of product building, and if you want to move from prototyping ideas to shipping real products, you need to start using coding agents and get comfortable with some engineering basics. Not to become an engineer, but to get the most out of the tools. — Read More

#strategy

The AI Chasm

Every week I see another LinkedIn post about how AI is going to transform everything. Another “X is dead” announcement and someone shipping their latest vibe coded project.

And don’t get me wrong. I get it. The hype is real.

My head isn’t buried in the sand. It doesn’t have the same Amazon Alexa and NFT vibes. This is more like the internet or mobile phones for sure.

But think about how long both of those took to take off?

… I want to give you a different perspective to the AI rhetoric.

A more balanced view that you might disagree with – claiming “but this time it’s different’ – or agree with.

Either way I want to hopefully give you a different perspective on everything that is happening right now. One that’s based in research and what’s historically happened before. — Read More

#strategy

A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all.

We pulled dozens of AGENTS.md files from across our monorepo and measured their effect on code generation. The best ones gave our coding agent a quality jump equivalent to upgrading from Haiku to Opus. The worst ones made the output worse than having no AGENTS.md at all.

That gap was surprising enough that we built a systematic study around it.

What we found: most of what people put in AGENTS.md either doesn’t help or actively hurts, and the patterns that work are specific and learnable. — Read More

#devops

Building the 11 Layers of a Production-Grade MCP Server + Agentic System

MCP servers are becoming the core focus of production agentic systems because they are where all the hard problems actually live: multi-tenant isolation, auth, rate limits, audit trails, and approval gates for destructive operations. Without them, agents leak data across tenants, burn budgets in runaway loops, and commit to refunds no human approved. An MCP server solves this by sitting between the agents and the data layer as a single secure tool surface, turning every agent call into an authenticated, policy-checked, rate-limited, audited operation before it touches a single row …

In this blog, we are going to build Atlas-MCP, a production-grade MCP server organized around twelve components that keep showing up on the 3 AM pager when teams skip them. On top of the server, we are also going to build a four-agent support copilot (Planner, Retriever, Synthesizer, Critic) that uses the server’s tools to answer real customer support tickets end to end. — Read More

#devops

Challenges and Research Directions for Large Language Model Inference Hardware

Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and interconnect rather than compute. To address these challenges, we highlight four architecture research opportunities: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and 3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup communication. While our focus is datacenter AI, we also review their applicability for mobile devices. — Read More

#performance

Mythos on Discord

Anthropic said Mythos was too dangerous to release. Then four random guys in a Discord gained access on day one by guessing the URL… — Read More

#cyber

YouTube expands its AI likeness detection technology to celebrities

YouTube is expanding its new “likeness detection” technology, which identifies AI-generated content, such as deepfakes, to people within the entertainment industry, the company announced on Tuesday.

The technology works similarly to YouTube’s existing Content ID system, which detects copyright-protected material in users’ uploaded videos, allowing rights owners to request removal or share in the video’s revenue.

Likeness detection does the same, but for simulated faces.  — Read More

#fake

GPT Image Generation Models Prompting Guide

OpenAI’s gpt-image generation models are designed for production-quality visuals and highly controllable creative workflows. They are well-suited for both professional design tasks and iterative content creation, and support both high-quality rendering and lower-latency use cases depending on the workflow.

… This guide highlights prompting patterns, best practices, and example prompts drawn from real production use cases for gpt-image-2. It is our most capable image model, with stronger image quality, improved editing performance, and broader support for production workflows. The low quality setting is especially strong for latency-sensitive use cases, while medium and high remain good fits when maximum fidelity matters. — Read More

#image-recognition