The AI engineering stack we built internally — on the platform we ship

In the last 30 days, 93% of Cloudflare’s R&D organization used AI coding tools powered by infrastructure we built on our own platform.

Eleven months ago, we undertook a major project: to truly integrate AI into our engineering stack. We needed to build the internal MCP servers, access layer, and AI tooling necessary for agents to be useful at Cloudflare. We pulled together engineers from across the company to form a tiger team called iMARS (Internal MCP Agent/Server Rollout Squad). The sustained work landed with the Dev Productivity team, who also own much of our internal tooling including CI/CD, build systems, and automation..

… MCP servers were the starting point, but the team quickly realized we needed to go further: rethink how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate across thousands of repos..

This post dives deep into what that looked like over the past eleven months and where we ended up.  — Read More

#devops

The Boy That Cried Mythos: Verification is Collapsing Trust in Anthropic

I’ve been getting more and more curious about the risk from Anthropic’s Claude Mythos Preview. So I pulled the system card, a whoppingly inefficient 244-page document that devotes just seven pages to the claim that the model is too dangerous to release. In fact, the 23MB of PDF I had to download was 20MB of wasted time and space. Compressing the PDF to 3MB meant I lost exactly nothing.

Foreshadowing, I guess.

Spoiler alert: the crucial seven pages out of 244 do not contain the word “fuzzer” once. That’s like a seven page vacation brochure for Hawaii that leaves out the word beaches.

Also, the crucial seven pages out of 244 do not contain the expected acronyms CVSS, CWE or CVE, they do not have comparison baseline, an independent reproduction, or the word “thousands.” I’ll get back to all of that in a minute. — Read More

#cyber

Benchmarking Self-Hosted LLMs for Offensive Security

LLM Agents can Autonomously Exploit One-day Vulnerabilities demonstrated that frontier models can exploit known vulnerabilities when given appropriate tooling. And if you have used Claude Code, there is no doubt you’ve either used it or have seen how well it can reverse engineer.

However, Benchmarking Practices in LLM-driven Offensive Security surveyed multiple papers in this space and found that only around 25% evaluated local or small models. The majority relied on GPT-4 or similar cloud-hosted frontier models, often with CTF-style challenges where hints were embedded in the prompt.

In this work, I defined a set of simple challenges to give a locally hosted model a single HTTP request tool that pointed to Juice Shop. The amount of guidance varies by challenge, and some provide only an endpoint and a goal. Whereas others include step-by-step instructions, but in all cases, the model must craft and execute the actual payloads. As it goes on, there are caveats that are added and anecdotal notes. — Read More

#cyber

Best practices for building agentic systems

Agentic AI has emerged as the software industry’s latest shiny thing. Beyond smarter chatbots, AI agents operate with increasing autonomy, making them poised to drive efficiency gains across enterprises.

“Agentic refers to AI systems that can take actions on behalf of users, not just generate text or answer questions,” says Andrew McNamara, director of applied machine learning at Shopify. Agentic systems run continuously until a task is complete, he adds, citing Shopify’s Sidekick, a proactive agent for merchants.

Development of agentic AI now spans many business domains. According to Anthropic, a provider of large language models (LLMs), AI agents are most commonly deployed in software engineering, accounting for roughly half of use cases, followed by back-office automation, marketing, sales, finance, and data analysis. — Read More

#devops

Quantum Computers Are Not a Threat to 128-bit Symmetric Keys

The advancing threat of cryptographically-relevant quantum computers has made it urgent to replace currently-deployed asymmetric cryptography primitives—key exchange (ECDH) and digital signatures (RSA, ECDSA, EdDSA)—which are vulnerable to Shor’s quantum algorithm. It does not, however, impact existing symmetric cryptography algorithms (AES, SHA-2, SHA-3) or their key sizes.

There’s a common misconception that quantum computers will “halve” the security of symmetric keys, requiring 256-bit keys for 128 bits of security. That is not an accurate interpretation of the speedup offered by quantum algorithms, it’s not reflected in any compliance mandate, and risks diverting energy and attention from actually necessary post-quantum transition work. The misconception is usually based on a misunderstanding of the applicability of a different quantum algorithm, Grover’s.

AES-128 is safe against quantum computers. SHA-256 is safe against quantum computers. No symmetric key sizes have to change as part of the post-quantum transition. This is a near-consensus opinion amongst experts and standardization bodies and it needs to propagate to the rest of the IT community.  — Read More

#quantum

Wardgate – AI Agent Security Gateway

Wardgate is a security gateway that sits between AI agents and the outside world — isolating credentials for API calls, isolating SSH keys for remote command execution, and gating command execution in remote environments (conclaves).

Give your AI agents access to APIs, SSH keys, and shell tools – without giving them your credentials or trusting them with direct execution. — Read More

#devops

The Agent Stack Bet

Peek under the hood of most “production agents” shipping today and you won’t find intelligence. You’ll find custom plumbing, fragile session logic, shared service accounts, and a security model held together by hope. This can be so much better.

If you’ve spent the last 18 months putting agents into production, you already know the models and tools have gotten dramatically better. You also know the problems that are still burning your on-call rotation are not problems you can prompt your way out of. We are running into a stack ceiling, and it is quietly creating a governance and reliability gap that the next generation of agentic systems cannot grow through.

Right now the industry is living with what I’d call excessive agencyautonomous systems given broad permissions to get things done, then left to discover – at runtime, in production – that a schema drifted, an API changed, or a downstream service started returning PII it wasn’t supposed to. Agents mark tasks “complete” while leaving a trail of corrupted state behind them. The humans find out on Monday.

This is not a failure of the people building agents. It is a failure of the stack they’re building on. — Read More

#architecture

Mythos, Memory Loss, and the Part InfoSec Keeps Missing

InfoSec has a bad habit of acting like history started this morning. Something new lands, the industry loses its mind for a week, vendors start talking like the old rules no longer apply, and half the industry suddenly forgets how organizations actually get compromised.

We are doing that again with Mythos.

Mythos is legitimately impressive. It is very good at finding bugs, useful for exploit development, and materially improves the speed and quality of vulnerability research work. Anyone pretending otherwise is coping. But the conversation around it is already drifting into the same bad pattern this industry falls into every time a new offensive capability shows up: people fixate on the most technically dramatic part of the story and lose sight of what actually matters operationally.

That is the problem. The question is not whether Mythos is good at bug hunting and helping write exploits, it clearly is. The question is what that means for most defenders right now, and the answer is not “drop everything, autonomous zero-day machines are now the main thing compromising your environment.”

For most organizations, the bigger problem is still much more boring and damaging: ransomware crews, extortion operations, stolen credentials, phishing, exposed edge services, weak identity controls, stale appliances, known vulnerabilities, bad segmentation, and environments where once somebody gets in, they can move far too easily. Mythos does not replace that reality, it lands on top of it. If you miss that, you end up having the wrong conversation and spending your time talking about AI-generated zero-day storms while attackers keep getting paid through the same doors defenders left open last quarter. — Read More

#cyber

The State of AI Adoption in the Enterprise [Q1 2026 Review]

You’ve seen the headline: “95% of enterprise AI pilots fail.”

… The 95% figure measures one thing: whether an AI pilot produced rapid P&L impact within six months. Not productivity. Not cost savings. Not efficiency gains. And it mostly measured pilots in sales and marketing — the lowest-ROI area in the study.

Measured that way, most projects will “fail.” A new hire doesn’t move the P&L in six months either… they often take six months or more to ramp up!

The study’s most important finding got buried: vendor-led deployments succeed 67% of the time. Internal builds succeed one-third of the time. This was always a story about strategy, not technology. This is a better takeaway for enterprises to focus on. — Read More

#strategy

Salesforce launches Headless 360 to support agent-first enterprise workflows

Salesforce is packaging its developer and AI tooling, including its vibe coding environment Agentforce Vibes, into a new platform named Headless 360, designed to help enterprise teams build agent-first workflows.

The CRM software provider defines agent-first workflows as enterprise processes in which software agents, rather than human users, carry out tasks by directly invoking APIs, tools, and predefined business logic.

To support this approach, Headless 360 exposes Salesforce’s underlying data, workflows, and governance controls as APIs, MCP tools, and CLI commands, via its existing offerings, such as Data 360, Customer 360, and Agentforce, Joe Inzerillo, president of AI technology at Salesforce, said during a press briefing. — Read More

#devops