World’s first ‘biomimetic AI robot’ debuts in Shanghai

Read More
#robotics

Reasoning models struggle to control their chains of thought, and that’s good

As AI agents become capable of carrying out increasingly complex and autonomous tasks, maintaining reliable oversight of their behavior becomes more important. Consistent with our principle of iterative deployment, we study how systems behave in real-world settings and continuously refine safeguards as capabilities advance. To support this, our safety approach uses defense-in-depth, with multiple complementary layers of defense such as safety trainingbehavioral testingagentic code review⁠(opens in a new window), and  chain-of-thought (CoT) monitoring. CoT monitoring analyzes the reasoning steps agents generate while pursuing tasks. These reasoning traces can provide valuable signals during both training and deployment, helping monitoring systems identify when an agent’s behavior may be unsafe or inconsistent with the user’s intended goals.

Today, we find that models’ reasoning is generally interpretable and easy to monitor. However, in the future, monitorability may break down for a variety of reasons⁠(opens in a new window). Here, we focus on one such path: if agents become capable of deliberately reshaping or obscuring their reasoning when they know they are being monitored, evaluations could overestimate a system’s alignment or safety, and monitoring systems could become less reliable. 

In this work, we study whether current reasoning models are capable of controlling their chain of thought in ways that reduce monitorability. Understanding this capability is important for ensuring that CoT monitoring remains a robust safeguard as AI systems grow more capable. We find that current reasoning models struggle to control their CoTs, even when told they are being monitored. While controllability is higher for larger models, it decreases as models are asked to reason for longer and when they undergo additional post-training. Our results suggest that CoT controllability is unlikely to currently pose a major risk to CoT monitorability, though continued evaluation will be important as models advance. — Read More

#ethics

The First Multi-Behavior Brain Upload

The Singularity has belonged exclusively to artificial minds, until now. For decades, whole-brain emulation has been the tantalizing counterpart to artificial intelligence: copy a biological brain, neuron by neuron and synapse by synapse, and run it. Today, for the first time, I am releasing a video from a company I helped found, Eon Systems PBC, demonstrating what we believe is the world’s first embodiment of a whole-brain emulation that produces multiple behaviors.

In 2024, Eon senior scientist Philip Shiu and collaborators published in Nature a computational model of the entire adult Drosophila melanogaster brain, containing more than 125,000 neurons and 50 million synaptic connections, built from the FlyWire connectome and machine learning predictions of neurotransmitter identity. That model predicted motor behavior at 95% accuracy. But it was disembodied: a brain without a body, activation without physics, motor outputs with nowhere to go.

Now the brain has somewhere to go. Building on previous work, including Shiu et al.’s whole-brain computational model, the NeuroMechFly v2 embodied simulation framework, and Özdil et al.’s research on centralized brain networks underlying body part coordination, this demonstration integrates Eon’s connectome-based brain emulation with a physics-simulated fly body in MuJoCo. — Read More

#human

How does AI understand my visual searches?

We’ve all been there: You see a photo of a perfectly styled living room or a well-curated street-style outfit, and you want to know where everything came from. Until recently, visual search was a one-item-at-a-time process. But a major update to Circle to Search and Lens now allows Google to break down and search for multiple objects within a single image simultaneously. This means if you use Circle to Search on Android to search for an entire outfit, you’ll see results for every component of a look, not just one piece at a time. In recent months, we’ve also launched several updates that enhance both visual search and image results in AI Mode, so you can better find inspiration as you search. — Read More

#image-recognition

Anthropic’s Compute Advantage: Why Silicon Strategy is Becoming an AI Moat

Compute is not a commodity for frontier AI labs. It is a structural cost input that determines margin, throughput, and model iteration velocity at scale. The divergence in how Anthropic, OpenAI, and Microsoft have approached silicon procurement over the last 18 months is not just a supply chain story — it is a compounding strategic gap.

Anthropic has built what is today the most diversified and cost-efficient compute architecture among frontier AI labs. OpenAI remains almost entirely dependent on Nvidia. Microsoft’s internal chip program is years behind schedule. The structural implications favor Anthropic on unit economics and negotiating leverage as inference workloads scale. While Anthropic has had so much demand, it has struggled with up-time — its long-term strategy is the most fundamentally resilient.

One important caveat up front: compute advantage amplifies model advantage; it does not replace it. If a competitor’s models are materially better, customers absorb the higher token cost. The argument here is not that Anthropic wins because of infrastructure. The argument is that equivalent model quality delivered at 30–60% lower cost per token is a compounding advantage — on margin, on training budget, and on the pace of iteration. — Read More

#performance

The Death of Spotify: Why Streaming is Minutes Away From Being Obsolete

I was walking down Queen Street in Toronto last week, completely zoned out, listening to Episode #391 of David Senra’s Founders podcast. If you don’t listen to Founders, you should. Senra obsessively analyzes the careers of history’s greatest entrepreneurs. This particular episode was a two-hour deep dive into the life and mind of one of my biggest heroes – Jimmy Iovine.

… About an hour into the podcast, Jimmy Iovine starts discussing the current state of the music business. I literally stopped walking. I had to pull out my phone and rewind it three times just to make sure I heard him correctly.

Speaking about Spotify and Apple Music, Iovine flatly stated: “The streaming services, to me, are minutes away from being obsolete.”Read More

#strategy

Labor market impacts of AI: A new measure and early evidence

The rapid diffusion of AI is generating a wave of research measuring and forecasting its impacts on labor markets. But the track record of past approaches gives reason for humility.

… In this paper, we present a new framework for understanding AI’s labor market impacts, and test it against early data, finding limited evidence that AI has affected employment to date. Our goal is to establish an approach for measuring how AI is affecting employment, and to revisit these analyses periodically. This approach won’t capture every channel through which AI could reshape the labor market, but by laying this groundwork now, before meaningful effects have emerged, we hope future findings will more reliably identify economic disruption than post-hoc analyses. — Read More

#strategy

Netflix Acquires AI Filmmaking Start-Up Founded by Ben Affleck

In a rare acquisition, Netflix has bought InterPositive, a start-up founded by Ben Affleck that makes AI-powered tools for filmmakers.

… While Netflix historically is more often a builder than a buyer, the company said it saw Affleck’s InterPositive as providing a unique set of AI tools that “keeps filmmakers at the center of the process.”  — Read More

#vfx

Moats in the Age of AI

We’re currently in the SaaSpocalypse. People believe software is dead and margins will compress to zero. Some are even saying that companies like Visa get bypassed and DoorDash gets aggregated away in the age of AI. Everything that looks like software becomes a commodity and no moats remain.

Before we declare the end of defensibility of all businesses, I think it’s worth grounding ourselves in the actual sources of defensibility that exist. My favourite book around defensibility and moats is Hamilton Helmer’s 7 Powers which outlines the common ways companies build defensibility.

The question is: In an AI world, which sources of power weaken, and which survive?Read More

#strategy

You Need to Rewrite Your CLI for AI Agents

I built a CLI for Google Workspace — agents first. Not “built a CLI, then noticed agents were using it.” From Day One, the design assumptions were shaped by the fact that AI agents would be the primary consumers of every command, every flag, and every byte of output.

CLIs are increasingly the lowest-friction interface for AI agents to reach external systems. Agents don’t need GUIs. They need deterministic, machine-readable output, self-describing schemas they can introspect at runtime, and safety rails against their own hallucinations. — Read More

#devops