Scaling Karpathy’s Autoresearch: What Happens When the Agent Gets a GPU Cluster

We pointed Claude Code at autoresearch and gave it access to 16 GPUs on a Kubernetes cluster. Over 8 hours it submitted ~910 experiments, found that scaling model width mattered more than any single hyperparameter, taught itself to use H200s for validation while screening ideas on H100s, and drove val_bpb from 1.003 down to 0.974 – a 2.87% improvement over baseline.

Beyond raw speedup, parallelism changed how the agent searched. With one GPU, it’s stuck doing greedy hill-climbing – try one thing, check, repeat. With 16 GPUs, it ran factorial grids of 10-13 experiments per wave, catching interaction effects between parameters that sequential search would miss. For example, the agent tested six model widths in a single wave, saw the trend immediately, and zeroed in on the best one – one round instead of six. — Read More

#performance

The Future of SaaS Is Agentic

The future of SaaS is agentic, but agentic SaaS is not just a chatbot layered on top of APIs and a dashboard. Traditional SaaS was built for users to operate software manually; agentic SaaS shifts that burden to software that can act on behalf of users. That changes both the interface and the architecture: the UI remains, but becomes a layer for intent, supervision, and review, while the product itself evolves into a system of stateful processes that can plan, execute, and adapt over time. The winners will not be the products with the most AI features, but the ones that remove the most friction and make software feel less like a tool to operate and more like a system that works for you. — Read More

#strategy

OpenAI is throwing everything into building a fully automated researcher

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. ​​OpenAI says that this new research goal will be its “North Star” for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and interpretability.

There’s even a timeline. OpenAI plans to build “an autonomous AI research intern”—a system that can take on a small number of specific research problems by itself—by September. The AI intern will be the precursor to a fully automated multi-agent research system that the company plans to debut in 2028. — Read More

#strategy

Stop Writing Prompts. Start Programming LLMs.

I’ve written more prompts than I care to admit. 🙂

During my PhD at the University of Copenhagen, I spent embarrassing amounts of time tweaking system prompts, adjusting few-shot examples, and praying that my carefully crafted instructions would survive the next model update. Spoiler: they rarely did. Then recently I discovered DSPy, and I realized I’d been doing it all wrong.

… DSPy (Declarative Self-improving Python) from Stanford NLP flips the entire paradigm. Instead of writing brittle prompt strings, you write structured Python code. Instead of manually optimizing prompts, you let the framework compile them for you. — Read More

#devops

Lossy self-improvement

Fast takeoff, the singularity, and recursive self-improvement (RSI) are all top of mind in AI circles these days. There are elements of truth to them in what’s happening in the AI industry. Two, maybe three, labs are consolidating as an oligopoly with access to the best AI models (and the resources to build the next ones). The AI tools of today are abruptly transforming engineering and research jobs.

AI research is becoming much easier in many ways. The technical problems that need to be solved to scale training large language models even further are formidable. Super-human coding assistants making these approachable is breaking a lot of former claims of what building these things entailed. Together this is setting us up for a year (or more) of rapid progress at the cutting edge of AI.

We’re also at a time where language models are already extremely good. They’re in fact good enough for plenty of extremely valuable knowledge-work tasks. Language models taking another big step is hard to imagine — it’s unclear which tasks they’re going to master this year outside of code and CLI-based computer-use. There will be some new ones! These capabilities unlock new styles of working that’ll send more ripples through the economy.

These dramatic changes almost make it seem like a foregone conclusion that language models can then just keep accelerating progress on their own. The popular language for this is a recursive self-improvement loop. — Read More

#training