At the core of Terraform Enterprise 2.0 is support for Stacks, a new infrastructure orchestration capability that allows teams to manage collections of infrastructure as a single unit. Terraform Stacks are available on all plans based on resources under management.
As organizations scale, infrastructure evolves from isolated configurations into systems of interconnected components. Stacks reflect this shift by introducing a configuration layer that enables teams to define and manage infrastructure across environments, regions, and accounts in a consistent, repeatable way. — Read More
Recent Updates Page 8
The Race to Own the Agentic Future
I haven’t been writing a lot for reasons I’ll share below. So when I was invited by Stripe to speak on the SaaSpocalypse as part of their SaaS Platform Leaders Summit, it turns out I had a lot to say. Simple questions were met with word gush as thoughts that had been built up inside my head over the last weeks and months tumbled out.
Writing is synthesis for me, so here’s my attempt to crystallize my view of the SaaSpocalypse.
The crowd was mainly vertical SaaS CEOs so this essay is written as such. But, the LLMs are moving up the stack, so much of this is applicable to Native AI startups as well. — Read More
The Token Economy pt2: The Intelligence Company Gets Built
Some companies are rebuilding themselves around AI. Everyone else is waiting for a lab, vendor, owner, or competitor to do it for them.
Token Economy Part 1 said tokens don’t create productivity. The operating model does.
This week shows what happens next: if you can’t build that operating model yourself, someone will install it for you. — Read More
How Non-Technical PMs Are Building Products Without Engineers
Polymarket launches private company trading so investors can speculate on Anthropic, OpenAI
Polymarket is moving deeper into private markets — and this time, the contracts are tied to companies most investors can talk about, but still cannot actually buy.
The company is launching prediction markets tied to private company milestones, including valuations, IPO timing and secondary-market activity for names like OpenAI and Anthropic.
Nasdaq Private Market will serve as the exclusive resolution data provider, supplying the information that determines whether these contracts pay out. — Read More
OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team
Andrej Karpathy, the AI researcher who co-founded and formerly worked at OpenAI and previously led AI at Tesla, has joined Anthropic.
“I’ve joined Anthropic,” Karpathy posted on X Tuesday. “I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D.” — Read More
Generalization Dynamics of LM Pre-training
People typically assume that LMs stably mature from pattern-matching parrots to generalizable intelligence during pre-training. We build a toy eval suite and show this mental model is wrong: throughout pre-training, LMs frequently and suddenly hop between parrot-like and intelligence-like modes, i.e. distinct algorithms implemented by distinct circuits. We call this mode-hopping. Across our suite, LMs can suddenly latch onto memorized or in-context patterns instead of in-context learning, use System 1 instead of System 2 thinking, pick up what sounds true instead of what is true, fail at multi-hop persona QA, out-of-context reasoning, and emergent misalignment — then just as suddenly revert and generalize. Mode-hopping is not explained by standard optimization dynamics: it is locally stable and can not be fixed by checkpoint averaging. We instead think of it as a capacity allocation problem: in a capacity-bounded model, generalizable circuits must compete with the shallow ones learned early in training, and the data in each pre-training window decides which circuits win. Our suite provides a cheap set of pre-training monitors and a new lens on generalization. Building upon our insights, we demonstrate three applications: (i) select intermediate pre-training checkpoints that strongly generalize reasoning and alignment, better than the final pre- or mid-training checkpoints, (ii) select pre-training data that controls and stabilizes generalization dynamics, and (iii) test prior generalization predictors, falsifying the monolithic belief that “simpler solutions generalize better”. — Read More
Agent Evaluation: A Detailed Guide
Evaluation is one of the most important research areas for large language models (LLMs). Recently, patterns in LLM usage and evaluation have drastically changed. Whereas we previously evaluated LLMs using benchmarks composed of static questions or short conversations, we now have agent systems that operate over long time horizons and interact with the environment. Agents are difficult to properly evaluate due to their complexity and autonomy. To accurately measure the capabilities of an agent system, we must build harnesses that are realistic and capable of testing agents similarly to how they are used in practice. Building such evaluation capabilities is now more important than ever due to the growing adoption of agents in high-stakes applications like coding and medicine.
This overview will provide a detailed guide of how current agent systems are evaluated. We will begin by developing an understanding of agents in general, covering everything from basic concepts to multi-agent systems. We will then provide a clear framework for the agent evaluation process based upon common patterns observed in practice. Building upon this knowledge, we will end with several case studies of recent agent benchmarks and provide a roadmap that outlines how to build our own agent evaluation by applying similar concepts. Although evaluation is time-consuming and difficult, learning how to properly evaluate agents is incredibly valuable. By rigorously measuring performance and not relying on anecdotal checks, we can rapidly improve agent capabilities. — Read More
Claude Code as a Data Analyst: From Zero to First Report
As data analysts we’ve all been there, the dreaded request for the monthly/yearly [insert topic] report, an essential task that’s also a massive time sink.
My thoughts for the last week? “Can’t AI just… do this?” Surely, it can whip up a simple data analysis report. Right? — Read More
Spec-Driven Development Isn’t Broken. It will collapse.
… “prompting has split into four skills” — Context, Intent, Specification, Prompt. Everyone matched a tension one of us had brought into the room. And once they had names, something else clicked: the four crafts mapped cleanly onto P-CAM — Perception, Cognition, Agency, Manifestation.
…For the last eight months, the argument has been spec versus vibe. Structure versus flow. Waterfall versus emergence.
…Every standard critique of SDD, and every standard critique of vibe, traces back to the same thing. Not two sets of failures. One failure, surfacing on both sides of the debate. The three-layer collapse.
…Vibe coding collapsed because it had no contract. Spec-driven development is collapsing because it has three contracts pretending to be one. What rises from the fusion isn’t a new brand. It isn’t a better tool. It’s a separation of concerns — the oldest principle in software engineering — applied one layer up, to the documents we use to instruct the machines that write the documents. — Read More