Daily Archives: March 9, 2026
Reasoning models struggle to control their chains of thought, and that’s good
As AI agents become capable of carrying out increasingly complex and autonomous tasks, maintaining reliable oversight of their behavior becomes more important. Consistent with our principle of iterative deployment, we study how systems behave in real-world settings and continuously refine safeguards as capabilities advance. To support this, our safety approach uses defense-in-depth, with multiple complementary layers of defense such as safety training, behavioral testing, agentic code review(opens in a new window), and chain-of-thought (CoT) monitoring. CoT monitoring analyzes the reasoning steps agents generate while pursuing tasks. These reasoning traces can provide valuable signals during both training and deployment, helping monitoring systems identify when an agent’s behavior may be unsafe or inconsistent with the user’s intended goals.
Today, we find that models’ reasoning is generally interpretable and easy to monitor. However, in the future, monitorability may break down for a variety of reasons(opens in a new window). Here, we focus on one such path: if agents become capable of deliberately reshaping or obscuring their reasoning when they know they are being monitored, evaluations could overestimate a system’s alignment or safety, and monitoring systems could become less reliable.
In this work, we study whether current reasoning models are capable of controlling their chain of thought in ways that reduce monitorability. Understanding this capability is important for ensuring that CoT monitoring remains a robust safeguard as AI systems grow more capable. We find that current reasoning models struggle to control their CoTs, even when told they are being monitored. While controllability is higher for larger models, it decreases as models are asked to reason for longer and when they undergo additional post-training. Our results suggest that CoT controllability is unlikely to currently pose a major risk to CoT monitorability, though continued evaluation will be important as models advance. — Read More
The First Multi-Behavior Brain Upload
The Singularity has belonged exclusively to artificial minds, until now. For decades, whole-brain emulation has been the tantalizing counterpart to artificial intelligence: copy a biological brain, neuron by neuron and synapse by synapse, and run it. Today, for the first time, I am releasing a video from a company I helped found, Eon Systems PBC, demonstrating what we believe is the world’s first embodiment of a whole-brain emulation that produces multiple behaviors.
In 2024, Eon senior scientist Philip Shiu and collaborators published in Nature a computational model of the entire adult Drosophila melanogaster brain, containing more than 125,000 neurons and 50 million synaptic connections, built from the FlyWire connectome and machine learning predictions of neurotransmitter identity. That model predicted motor behavior at 95% accuracy. But it was disembodied: a brain without a body, activation without physics, motor outputs with nowhere to go.
Now the brain has somewhere to go. Building on previous work, including Shiu et al.’s whole-brain computational model, the NeuroMechFly v2 embodied simulation framework, and Özdil et al.’s research on centralized brain networks underlying body part coordination, this demonstration integrates Eon’s connectome-based brain emulation with a physics-simulated fly body in MuJoCo. — Read More