NO SECURITY METER FOR AI

Let’s say you wanted to make sure that your AI is secure. Can you just maximize the security and privacy benchmark
and call it a day? Nope, because benchmarks don’t actually work for measuring AI capabilities (even when they are
NOT emergent systemic properties like security). So let’s take a step back: how do you measure security in the first
place? Good question. Over the last 30 years, security engineering for software evolved from black box penetration testing, through whitebox code analysis and architectural risk analysis to de facto process-driven standards like
the Building Security In Maturity Model (BSIMM). Software had a very deep impact on business operations, and it
appears that AI is going to have an even deeper impact. Will a software security-like measurement move work for
AI? Probably. In the meantime we can make real progress in AI security by cleaning up our WHAT piles and managing risk by identifying and applying good assurance processes. (Spoiler alert: no matter what we do, we still don’t
get a security meter for AI, so we need to be extra vigilant about security.) — Read More

#cyber

Cheap AI could derail OpenAI and Anthropic’s IPOs

This earnings season, the cost of AI started showing up in the numbers. MetaShopifySpotify, and Pinterest all flagged rising AI and inference costs as a drag on margins. Shopify said economies of scale were “partially offset by increased LLM costs.”

This is the bill coming due for the pricing model that underpins OpenAI’s and Anthropic’s expected IPO valuations, both projected north of $800 billion. Those numbers assume OpenAI and Anthropic will hold their market share and pricing power — that competitors can’t easily catch up, and that enterprise customers will keep paying a premium because there’s no real alternative.

But increasingly the data is pointing the other way. Cutting-edge AI is becoming abundant and cheap. Chinese labs are charging a fraction of what American labs do for comparable work, while a wave of Western challengers — Nvidia, Cohere, Reflection, Mistral — are building cheaper, smaller, more efficient alternatives for enterprises that won’t touch a Chinese model. By the time OpenAI and Anthropic file their prospectuses, with OpenAI’s confidential filing coming as soon as this week, the central premise of their valuations may already be gone. — Read More

#china-vs-us

Stanford’s 2026 AI Index Report

At Stanford HAI, we believe AI is poised to be the most transformative technology of the 21st century. But its benefits won’t be evenly distributed unless we guide its development thoughtfully. The AI Index offers one of the most comprehensive, data-driven views of artificial intelligence. Recognized as a trusted resource by global media, governments, and leading companies, the AI Index equips policymakers, business leaders, and the public with rigorous, objective insights into AI’s technical progress, economic influence, and societal impact. — Read More

#strategy

What’s Easy Now? What’s Hard Now?

This is the fourth in a series about how AI is changing software development, after It’s time to be right.What about juniors?, and My heuristics are wrong. What now?. It stands alone, but if you found this interesting you may also find those interesting.

I’ve been spending a lot of time thinking about the shape of the capabilities of coding agents. What they’re good at now, what they’re going to be good at. What they’re bad at now, how much of that is inherent and how much is transient. This is worth thinking about, because it’s the most important question shaping the future of software, and of software engineering. I don’t pretend to have an answer, but am coming to a conclusion that may be deeply counter-intuitive.

Coding agents are becoming very good indeed, and can build meaningful and correct software very quickly and at transformatively low cost. They have super-human abilities on some coding tasks. Of course, computer systems have had super human abilities for at least 85 years1. I think we’re going to find, as we have over those nine decades, that this new technology we’re building is vastly super-human in some areas2, and not nearly as capable as humans in others. — Read More

#devops

Accelerating scientific discovery with Co-Scientist

Scientific discovery is driven by scientists generating novel hypotheses for complex problems that undergo rigorous experimental validation. To augment this process, we introduce Co-Scientist, a multi-agent AI system built on Gemini for structured scientific thinking and hypothesis generation. Co-Scientist aims to help scientists discover new original knowledge. Conditioned on their research objectives and prior scientific evidence, it formulates demonstrably novel research hypotheses for experimental verification. The system’s design involves agents continuously generating, critiquing and refining hypotheses accelerated by scaling test-time compute. Key contributions include: (1) a multi-agent architecture with an asynchronous task execution framework for flexible compute scaling; (2) a tournament evolution process for self-improving hypotheses generation. Automated evaluations show continued benefits of test-time compute scaling, improving hypothesis quality over time. While general purpose, we focus the validation in three biomedical applications: drug repurposing, novel target discovery 1, and explaining mechanisms of anti-microbial resistance 2. Specifically, Co-Scientist helped identify new drug repurposing candidates and synergistic combination therapies for acute myeloid leukemia, which were validated through in vitro experiments. These real-world validations demonstrate the potential of Co-Scientist to accelerate scientific discovery and usher in an era of AI empowered scientists. — Read More

#big7

Geometric AI does not need attention

I got the idea for this post when I had a virtual coffee with an engineer who builds AI models for one of the big airplane builders. And he hasn’t built a model that writes your emails or hallucinates your legal documents, but his model does something different. It looks at, say, a winglet — that’s the little upturned fin at the tip of every commercial aircraft wing — and with it he is able to predict the turbulence it will generate with 98% accuracy.

Let that sit for a moment.

… I walked away from that coffee thinking about wave interference. Because turbulence is, at its core, a wave problem. Pressure waves, superimposed, creating chaotic but geometrically structured patterns. And if a model can learn those patterns in aerodynamics, the obvious question is, where else do superimposed wave systems produce instability that we desperately need to control? — Read More

#performance

Terraform Enterprise 2.0: Evolving infrastructure operations for scale

At the core of Terraform Enterprise 2.0 is support for Stacks, a new infrastructure orchestration capability that allows teams to manage collections of infrastructure as a single unit. Terraform Stacks are available on all plans based on resources under management.

As organizations scale, infrastructure evolves from isolated configurations into systems of interconnected components. Stacks reflect this shift by introducing a configuration layer that enables teams to define and manage infrastructure across environments, regions, and accounts in a consistent, repeatable way.  — Read More

#devops