Personal Superintelligence

Over the last few months we have begun to see glimpses of our AI systems improving themselves. The improvement is slow for now, but undeniable. Developing superintelligence is now in sight.

It seems clear that in the coming years, AI will improve all our existing systems and enable the creation and discovery of new things that aren’t imaginable today. But it is an open question what we will direct superintelligence towards.

In some ways this will be a new era for humanity, but in others it’s just a continuation of historical trends.  — Read More

#singularity

U.S. AI Policy & China’s Path

There is now a path for China to surpass the U.S. in AI. Even though the U.S. is still ahead, China has tremendous momentum with its vibrant open-weights model ecosystem and aggressive moves in semiconductor design and manufacturing. In the startup world, we know momentum matters: Even if a company is small today, a high rate of growth compounded for a few years quickly becomes an unstoppable force. This is why a small, scrappy team with high growth can threaten even behemoths. While both the U.S. and China are behemoths, China’s hypercompetitive business landscape and rapid diffusion of knowledge give it tremendous momentum. The White House’s AI Action Plan released last week, which explicitly champions open source (among other things), is a very positive step for the U.S., but by itself it won’t be sufficient to sustain the U.S. lead.  — Read More

#china-vs-us

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

tl;dr We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model. — Read More

Read the Paper; Access the  Code

#nlp

AI is eating the Internet

“You see? Another ad. We were just talking about this yesterday! How can you be so sure they’re not listening to us?” – My wife, at least once a week.

Internet advertising has gotten so good, it’s spooky. We worry about how much “they” know about us, but in exchange, we got something future generations may not: free content and services, and a mostly open Internet. It is unprecedented Faustian bargain, one that is now collapsing.

At the epicenter of the modern Internet sits Google. Forget the East India Company, Google, with an absurd +$100B in net income, is arguably the most successful business in history. By commanding nearly 70% of the global browser market and 89% of the search engine market, they dominated Internet through sheer reach. How did this happen? A delicate balance of incentives where every player on the Internet got exactly what they wanted. — Read More

#big7

AI and Secure Code Generation

At the end of 2024, 25 percent of new code at Google was being written not by humans, but by generative large language models (LLMs)—a practice known as “vibe coding.” While the name may sound silly, vibe coding is a tectonic shift in the way software is built. Indeed, the quality of LLMs themselves is improving at a rapid pace in every dimension we can measure—and many we can’t. This rapid automation is transforming software engineering on two fronts simultaneously: Artificial intelligence (AI) is not only writing new code; it is also beginning to analyze, debug, and reason about existing human-written code.

As a result, traditional ways of evaluating security—counting bugs, reviewing code, and tracing human intent—are becoming obsolete. AI experts no longer know if AI-generated code is safer, riskier, or simply vulnerable in different ways than human-written code. We must ask: Do AIs write code with more bugs, fewer bugs, or entirely new categories of bugs? And can AIs reliably discover vulnerabilities in legacy code that human reviewers miss—or overlook flaws humans find obvious? Whatever the answer, AI will never again be as inexperienced at code security analysis as it is today. And as is typical with information security, we are leaping into the future without useful metrics to measure position or velocity. — Read More

#devops

Why China isn’t about to leap ahead of the West on compute

We keep hearing that China is catching up with the West in AI compute. A great example of this comes from NVIDIA’s CEO Jensen Huang, who recently claimed that China has made “enormous progress” in the last few years, and that “China is right behind us. We’re very, very close.”

And China has indeed been making a ton of progress. As we’ll see, Chinese hardware has been closing the gap across a range of metrics relating to computational power and data transfer, both of which are crucial aspects of AI workloads.

But despite progress on these metrics, we don’t think China is about to leap ahead of the West on AI compute. China’s top developers—including Alibaba, ByteDance, Baidu, and DeepSeek—still rely primarily on NVIDIA chips. And major bottlenecks still remain before China can leap ahead. — Read More

#china-vs-us

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deployment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assigning different recursion depths to individual tokens. This allows MoR to focus quadratic attention computation only among tokens still active at a given recursion depth, further improving memory access efficiency by selectively caching only their key-value pairs. Beyond these core mechanisms, we also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to decrease prefill latency and memory footprint. Across model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost. — Read More

#performance

Robotic neck incision replaces heart valve with no chest opening in world first

In a surgical first, doctors have replaced a heart valve through a small neck incision using robotic assistance, avoiding the need to open the chest.

The pioneering procedure, performed at the Cleveland Clinic by cardiothoracic surgeon Dr. Marijan Koprivanac, marks the first known clinical use of transcervical robotic access for aortic valve replacement (AVR).

Four patients underwent the technique earlier this year and were discharged within days. — Read More

#robotics

Inverse Scaling in Test-Time Compute

We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse scaling relationship between test-time compute and accuracy. Our evaluation tasks span four categories: simple counting tasks with distractors, regression tasks with spurious features, deduction tasks with constraint tracking, and advanced AI risks. We identify five distinct failure modes when models reason for longer: 1) Claude models become increasingly distracted by irrelevant information; 2) OpenAI o-series models resist distractors but overfit to problem framings; 3) models shift from reasonable priors to spurious correlations; 4) all models show difficulties in maintaining focus on complex deductive tasks; and 5) extended reasoning may amplify concerning behaviors, with Claude Sonnet 4 showing increased expressions of self-preservation. These findings suggest that while test-time compute scaling remains promising for improving model capabilities, it may inadvertently reinforce problematic reasoning patterns. Our results demonstrate the importance of evaluating models across diverse reasoning lengths to identify and address these failure modes in LRMs. — Read More

#performance

The Rise of the AI Database: Powering Real-Time AI Applications

As AI rapidly evolves, organizations are racing to build and deploy high-performance gen AI apps that deliver real-time insights and seamless user experiences. Central to this transformation is the emergence of the generative AI database, a new category of data platform optimized for vector search, semantic indexing and full-text retrieval. These systems are designed to address challenges like data silos, data quality and integration for AI and analytics. As the name suggests, a gen AI database is purpose-built to power generative AI models and applications, enabling developers to store, query and analyze both structured and unstructured data at scale, with the data stored in these platforms playing a crucial role in supporting advanced analytics and machine learning. — Read More

#data-lake