OpenAI is throwing everything into building a fully automated researcher

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. ​​OpenAI says that this new research goal will be its “North Star” for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and interpretability.

There’s even a timeline. OpenAI plans to build “an autonomous AI research intern”—a system that can take on a small number of specific research problems by itself—by September. The AI intern will be the precursor to a fully automated multi-agent research system that the company plans to debut in 2028. — Read More

#strategy

Stop Writing Prompts. Start Programming LLMs.

I’ve written more prompts than I care to admit. 🙂

During my PhD at the University of Copenhagen, I spent embarrassing amounts of time tweaking system prompts, adjusting few-shot examples, and praying that my carefully crafted instructions would survive the next model update. Spoiler: they rarely did. Then recently I discovered DSPy, and I realized I’d been doing it all wrong.

… DSPy (Declarative Self-improving Python) from Stanford NLP flips the entire paradigm. Instead of writing brittle prompt strings, you write structured Python code. Instead of manually optimizing prompts, you let the framework compile them for you. — Read More

#devops

Lossy self-improvement

Fast takeoff, the singularity, and recursive self-improvement (RSI) are all top of mind in AI circles these days. There are elements of truth to them in what’s happening in the AI industry. Two, maybe three, labs are consolidating as an oligopoly with access to the best AI models (and the resources to build the next ones). The AI tools of today are abruptly transforming engineering and research jobs.

AI research is becoming much easier in many ways. The technical problems that need to be solved to scale training large language models even further are formidable. Super-human coding assistants making these approachable is breaking a lot of former claims of what building these things entailed. Together this is setting us up for a year (or more) of rapid progress at the cutting edge of AI.

We’re also at a time where language models are already extremely good. They’re in fact good enough for plenty of extremely valuable knowledge-work tasks. Language models taking another big step is hard to imagine — it’s unclear which tasks they’re going to master this year outside of code and CLI-based computer-use. There will be some new ones! These capabilities unlock new styles of working that’ll send more ripples through the economy.

These dramatic changes almost make it seem like a foregone conclusion that language models can then just keep accelerating progress on their own. The popular language for this is a recursive self-improvement loop. — Read More

#training

Val Kilmer Resurrected by AI to Star in ‘As Deep as the Grave’ Movie

Five years prior to his death in 2025Val Kilmer was cast as Father Fintan, a Catholic priest and Native American spiritualist, in “As Deep as the Grave.” But Kilmer, who was battling throat cancer, was too sick to ever make it to set.

… Even though he didn’t shoot a single scene, Voorhees has been able to realize his vision of having Kilmer in the ensemble by using state-of-the-art generative AI. And he’s done it with the cooperation of the late actor’s estate and his daughter Mercedes (Voorhees says Kilmer’s son Jack is also supportive). — Read More

#vfx

OpenClaw is the WordPress moment for agents

“openclaw is the wordpress moment for agents. the shopifys and substacks are coming!”

this might be the typical take of “if you’re in crypto pivot to ai”, reality is there’s lots a real software developer will find useful with openclaw.

what really changes? when normal people realise they can use it usefully. — Read More

#devops

Federal cyber experts called Microsoft’s cloud a “pile of shit,” approved it anyway

In late 2024, the federal government’s cybersecurity evaluators rendered a troubling verdict on one of Microsoft’s biggest cloud computing offerings.

The tech giant’s “lack of proper detailed security documentation” left reviewers with a “lack of confidence in assessing the system’s overall security posture,” according to an internal government report reviewed by ProPublica.

Or, as one member of the team put it: “The package is a pile of shit.”

… Yet, in a highly unusual move that still reverberates across Washington, the Federal Risk and Authorization Management Program, or FedRAMP, authorized the product anyway, bestowing what amounts to the federal government’s cybersecurity seal of approval. FedRAMP’s ruling—which included a kind of “buyer beware” notice to any federal agency considering GCC High—helped Microsoft expand a government business empire worth billions of dollars. — Read More

#cyber

World Models: Computing the Uncomputable

… I am on the record as being skeptical that LLMs will take us to superintelligence, but I think there is a real shot that World Models will drive superhuman, complementary machines that do things that we can’t, or don’t want to, do.

The world is a place where unexpected futures unfold, but in somewhat predictable ways. As humans, we can envision almost all of them with roughly the same amount of effort with a very similar amount of time given to each thought. Computers can’t.

It’s no wonder traditional computing struggles with this complexity. Imagine anticipating and coding each and every action, as well as the interactions between all of those actions. Mathematically, in a traditional engine, simulating N fans is at least an O(N) or O(N2) problem. Each person, flag, chair, and ball must be explicitly calculated — and really, the interactions between them need to be calculated, too.

In robotics, machines must respond to situations in the real world in the same amount of time, regardless of their complexity, even though, in traditional computing, different situations can take wildly different amounts of time to simulate. This has been a major bottleneck for robotics and embodied AI progress.

World Models are a solution to that problem. — Read More

#strategy

Enterprise AI Has a Checkbox Problem

… Today, AI sits adjacent to the work. It assists. It suggests. It drafts. But it doesn’t run the operating room, underwrite the loan, or manage the supply chain. Not in production. Not yet.

“You can’t just slot [AI] in to a critical workflow in health care and all of a sudden show up where if you make a misdiagnosis or if you make a mischaracterization of a procedure, you can get fined and go to jail. If you’re in financial services and you make a mistake about somebody’s portfolio, or you make a misallocation and you point to a model, you will get sued and you will be in trouble.”

So what does every responsible enterprise do? They experiment at the edge. They run pilots. They check the box. They wait. — Read More

#strategy

Five strategies for deeper AI adoption at work

Why do some people become enthusiastic, consistent adopters of AI, while others give it a try and shrug? We collaborated with Stanford University researchers to find out.

Over the last 18 months, we took the researchers behind the curtain at Google to observe how Googlers were learning and using AI in their day-to-day work. The timing of the study allowed us to observe firsthand how the rapid pace of AI was fundamentally challenging and changing how we build, collaborate and lead.

The published study found that while most people were eager to find value in AI tools, many were stuck in what the researchers called “simple substitution”: swapping existing tasks for AI alternatives. But many found the effort it took to learn the AI tool and get to a good result was often greater than the payoff. Crucially, the researchers found that successful adopters didn’t just focus on prompt engineering or its more recent sibling, context engineering. Instead, deep AI adopters completely changed how they approached AI — taking inspiration from product management. — Read More

#strategy

Bill Gates Tried to Predict the Internet in 1999. I Tried the Same Exercise for AI.

Bill Gates published Business @ the Speed of Thought in 1999. I read it for the first time this summer, which is a bit like watching a prophet’s sermon after most of the prophecies have already come true.

It’s a strange reading experience. You keep nodding along, thinking “yes, obviously,” and then you remember that when he wrote this, most people were still using dial-up and the idea of checking your bank balance on a phone would have sounded like science fiction.

… I thought this book was interesting right now, as Gates was trying to answer a question in 1999 that we’re trying to answer again in 2026, just with different technology. Everyone is wrestling with, well what happens to business when information becomes fast, cheap, and ubiquitous?Read More.

#strategy