When the prize is “winner‑takes‑all” and everyone must pay their costs whether they win or lose, you don’t get measured competition—you get value (rent) dissipation [3]. That is what contest theory calls an all‑pay auction [0]. In expectation, participants spend roughly the entire value of the prize in aggregate trying to win it [1][2]. What happens when the perceived value of the prize is nearly infinite?
For AGI—where the imagined prize is monopoly‑like profits across software, science, society, the next industrial revolution, the whole fabric of human civilization—equilibrium spending is enormous by construction. In this worldview, the seemingly excessive capital allocation is rational: if you cut spending while rivals do not, you lose the race and everything you’ve already invested. Google co‑founder Larry Page has allegedly asserted (as relayed by investor Gavin Baker): “I am willing to go bankrupt rather than lose this race” [4]. — Read More
Monthly Archives: October 2025
The real problem with AI coding
The problem with AI coding isn’t technical debt. It’s comprehension debt.
And most teams don’t realize it until it’s too late.
…When you write code manually, you build up a clear mental model of the logic and trade-offs as you go. Every line you write, you understand why it exists. You see the edge cases. You know what alternatives you considered and rejected.
When AI writes code for you, that process inverts. You’re reverse-engineering someone else’s thinking after the fact. It’s like trying to learn calculus by reading a textbook instead of solving problems yourself.
… [V]olume amplifies the comprehension problem. You’re not just reverse-engineering one function. You’re reverse-engineering entire systems. — Read More
Signs of introspection in large language models
Have you ever asked an AI model what’s on its mind? Or to explain how it came up with its responses? Models will sometimes answer questions like these, but it’s hard to know what to make of their answers. Can AI systems really introspect—that is, can they consider their own thoughts? Or do they just make up plausible-sounding answers when they’re asked to do so?
Understanding whether AI systems can truly introspect has important implications for their transparency and reliability. If models can accurately report on their own internal mechanisms, this could help us understand their reasoning and debug behavioral issues. Beyond these immediate practical considerations, probing for high-level cognitive capabilities like introspection can shape our understanding of what these systems are and how they work. Using interpretability techniques, we’ve started to investigate this question scientifically, and found some surprising results.
Our new research provides evidence for some degree of introspective awareness in our current Claude models, as well as a degree of control over their own internal states. We stress that this introspective capability is still highly unreliable and limited in scope: we do not have evidence that current models can introspect in the same way, or to the same extent, that humans do. Nevertheless, these findings challenge some common intuitions about what language models are capable of—and since we found that the most capable models we tested (Claude Opus 4 and 4.1) performed the best on our tests of introspection, we think it’s likely that AI models’ introspective capabilities will continue to grow more sophisticated in the future. — Read More
On-Policy Distillation
LLMs are capable of expert performance in focused domains, a result of several capabilities stacked together: perception of input, knowledge retrieval, plan selection, and reliable execution. This requires a stack of training approaches[.]
… Smaller models with stronger training often outperform larger, generalist models in their trained domains of expertise. There are many benefits to using smaller models: they can be deployed locally for privacy or security considerations, can continuously train and get updated more easily, and save on inference costs. Taking advantage of these requires picking the right approach for the later stages of training.
Approaches to post-training a “student” model can be divided into two kinds:
Off-policy training relies on target outputs from some external source that the student learns to imitate.
On-policy training samples rollouts from the student model itself, and assigns them some reward.
We can do on-policy training via reinforcement learning, by grading each student rollout on whether it solves the question. This grading can be done by a human, or by a “teacher” model that reliably gets the correct answer. — Read More
LEVERAGING MACHINE LEARNING TO ENHANCE ACOUSTIC EAVESDROPPING ATTACKS
This multi-part series explores how machine learning can enhance eavesdropping on cellular audio using gyroscopes and accelerometers — inertial sensors commonly built into mobile devices to measure motion through Micro-Electro-Mechanical Systems (MEMS) technology. The research was conducted over the summer by one of our interns, Alec K., and a newly hired full-time engineer, August H.
Introduction
Acoustic eavesdropping attacks are a potentially devastating threat to the confidentiality of user information, especially if these attacks are implemented on smartphones, which are now ubiquitous. However, conventional microphone-based attacks are limited on smartphone devices by the fact that the user must consent to the collection of microphone information by applications. Recently, researchers on eavesdropping have taken to performing side-channel attacks that leverage information leaks from a piece of hardware to reconstruct some kind of secret (i.e. the audio we want to listen in on).
Unlike the microphone, which requires explicit user permission to access, sensors like the gyroscope and accelerometer do not require explicit user consent for an application to access their readings on Android. These devices are sensitive to the vibrations caused by sound, and since some Android devices allow sampling these sensors at frequencies up to 500 Hz, it is possible to reconstruct sound using these devices. — Read More
New physical attacks are quickly diluting secure enclave defenses from Nvidia, AMD, and Intel
Trusted execution environments, or TEEs, are everywhere—in blockchain architectures, virtually every cloud service, and computing involving AI, finance, and defense contractors. It’s hard to overstate the reliance that entire industries have on three TEEs in particular: Confidential Compute from Nvidia, SEV-SNP from AMD, and SGX and TDX from Intel. All three come with assurances that confidential data and sensitive computing can’t be viewed or altered, even if a server has suffered a complete compromise of the operating kernel.
A trio of novel physical attacks raises new questions about the true security offered by these TEES and the exaggerated promises and misconceptions coming from the big and small players using them.
The most recent attack, released Tuesday, is known as TEE.fail. It defeats the latest TEE protections from all three chipmakers. The low-cost, low-complexity attack works by placing a small piece of hardware between a single physical memory chip and the motherboard slot it plugs into. It also requires the attacker to compromise the operating system kernel. Once this three-minute attack is completed, Confidential Compute, SEV-SNP, and TDX/SDX can no longer be trusted. Unlike the Battering RAM and Wiretap attacks from last month—which worked only against CPUs using DDR4 memory—TEE.fail works against DDR5, allowing them to work against the latest TEEs. — Read More
THERMODYNAMIC COMPUTING: FROM ZERO TO ONE
Three years ago, Extropic made the bet that energy would become the limiting factor for AI scaling.
We were right.[1]
Scaling AI will require a major breakthrough in either energy production, or the energy efficiency of AI hardware and algorithms.
We are proud to unveil our breakthrough AI algorithms and hardware, which can run generative AI workloads using radically less energy than deep learning algorithms running on GPUs. — Read More
‘There isn’t really another choice:’ Signal chief explains why the encrypted messenger relies on AWS
After last week’s major Amazon Web Services (AWS) outage took Signal along with it, Elon Musk was quick to criticize the encrypted messaging app’s reliance on big tech. But Signal president Meredith Whittaker argues that the company didn’t have any other choice but to use AWS or another major cloud provider.
“The problem here is not that Signal ‘chose’ to run on AWS,” Whittaker writes in a series of posts on Bluesky. “The problem is the concentration of power in the infrastructure space that means there isn’t really another choice: the entire stack, practically speaking, is owned by 3-4 players.” — Read More
The New Calculus of AI-based Coding
Over the past three months, a team of experienced, like-minded engineers and I have been building something really cool within Amazon Bedrock. While I’m pretty excited about what we are building, there is another unique thing about our team – most of our code is written by AI agents such as Amazon Q or Kiro. Before you roll your eyes: no, we’re not vibe coding. I don’t believe that’s the right way to build robust software.
Instead, we use an approach where a human and AI agent collaborate to produce the code changes. For our team, every commit has an engineer’s name attached to it, and that engineer ultimately needs to review and stand behind the code. We use steering rules to setup constraints for how the AI agent should operate within our codebase, and writing in Rust has been a great benefit. Rust compiler is famous for focusing on correctness and safety, catching many problems at compile time and providing helpful error messages that help the agent iterate. As a juxtaposition to vibe coding, I prefer the term “agentic coding.” Much less exciting but in our industry boring is usually good. — Read More
Qwen Image Model — New Open Source Leader?
There has been some excitement over the last week or two around the new model in the Qwen series by Alibaba. Qwen Image is a 20B parameter — that’s 3 billion more than HiDream — MMDiT (Multimodal Diffusion Transformer) model, open-sourced under the Apache 2.0 license.
As well as the features of the core model it also uses the Qwen2.5-VL LLM for text encoding and has a specialised VAE (Variational Autoencoder). It supposedly can render readable, multilingual text in much longer forms than previous models and the VAE is trained to preserve small fonts, text edges and layout. Using Qwen2.5-VL as the text encoder should mean better language, vision and context understanding.
… These improvements come at a cost: size. The full BF16 model is 40GB in size, with the FP16 version of the text encoder coming in at an additional 16GB. FP8 versions are more reasonable at 20GB for the model and 9GB for the text encoder. If those sizes are still too large for your set up, then there are distilled versions available from links on the ComfyUI guide. City96 has also created various GGUF versions available for download from Hugging Face. — Read More