R-Zero: Self-Evolving Reasoning LLM from Zero Data

Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver. These models are optimized separately and co-evolve through interaction: the Challenger is rewarded for proposing tasks near the edge of the Solver capability, and the Solver is rewarded for solving increasingly challenging tasks posed by the Challenger. This process yields a targeted, self-improving curriculum without any pre-existing tasks and labels. Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks. — Read More

#singularity

Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence

This paper examines changes in the labor market for occupations exposed to generative artificial intelligence using high-frequency administrative data from the largest payroll software provider in the United States. We present six facts that characterize these shifts. We find that since the widespread adoption of generative AI, early-career workers (ages 22-25) in the most AI-exposed occupations have experienced a 13 percent relative decline in employment even after controlling for firm-level shocks. In contrast, employment for workers in less exposed fields and more experienced workers in the same occupations has remained stable or continued to grow. We also find that adjustments occur primarily through employment rather than compensation. Furthermore, employment declines are concentrated in occupations where AI is more likely to automate, rather than augment, human labor. Our results are robust to alternative explanations, such as excluding technology-related firms and excluding occupations amenable to remote work. These six facts provide early, large-scale evidence consistent with the hypothesis that the AI revolution is beginning to have a significant and disproportionate impact on entry-level workers in the American labor market. — Read More

#strategy

The Evidence That AI Is Destroying Jobs For Young People Just Got Stronger

In a moment with many important economic questions and fears, I continue to find this among the more interesting mysteries about the US economy in the long run: Is artificial intelligence already taking jobs from young people?

If you’ve been casually following the debate over AI and its effect on young graduates’ employment, you could be excused for thinking that the answer to that question is “possibly,” or “definitely yes,” or “almost certainly no.”

… To be honest with you, I considered this debate well and truly settled. No, I’d come to think, AI is probably not wrecking employment for young people. But now, I’m thinking about changing my mind again. — Read More

#strategy

Understanding LLMs: Insights from Mechanistic Interpretability

Since the release of ChatGPT in 2022, large language models (LLMs) based on the transformer architecture like ChatGPT, Gemini and Claude have transformed the world with their ability to produce high-quality, human-like text and more recently the ability to produce images and videos. Yet, behind this incredible capability lies a profound mystery: we don’t understand how these models work.

The reason is that LLMs aren’t built like traditional software. A traditional program is designed by human programmers and written in explicit, human-readable code. But LLMs are different. Instead of being programmed, LLMs are automatically trained to predict the next word on vast amounts of internet text, growing a complex network of trillions of connections that enable them to perform tasks and understand language. This training process automatically creates emergent knowledge and abilities, but the resulting model is usually messy, complex and incomprehensible since the training process optimizes the model for performance but not interpretability or ease of understanding.

The field of mechanistic interpretability aims to study LLM models and reverse engineer the knowledge and algorithms they use to perform tasks, a process that is more like biology or neuroscience than computer science.

The goal of this post is to provide insights into how LLMs work using findings from the field of mechanistic interpretability. — Read More

#nlp

Context Engineering Series: Building Better Agentic RAG Systems

We’ve moved far beyond prompt engineering. Now we’re designing portfolios of tools (directory listing, file editing, web search), slash commands like /pr-create that inject prompts vs , specialized sub-agents @pr-creation-agent, vs having an AGENT.md with systems that work across IDEs, command lines, GitHub, and Slack.

Context engineering is designing tool responses and interaction patterns that give agents situational awareness to navigate complex information spaces effectively.

To understand what this means practically, let’s look at how systems have evolved:

Before: We precomputed what chunks needed to be put into context, injected them, and then asked the system to reason about the chunks.
Now: Agents are incredibly easy to build because all you need is a messages array and a bunch of tools.  — Read More

#devops

The Parallelism Mesh Zoo

When training large scale LLMs, there is a large assortment of parallelization strategies which you can employ to scale your training runs to work on more GPUs. There are already a number of good resources for understanding how to parallelize your models: I particularly recommend How To Scale Your Model and The Ultra-Scale Playbook. The purpose of this blog post is to discuss parallelization strategies in a more schematic way by focusing only on how they affect your device mesh. The device mesh is an abstraction used by both PyTorch and JAX that takes your GPUs (however many of them you’ve got in your cluster!) and organizes them into a N-D tensor that expresses how the devices communicate with each other. When we parallelize computation, we shard a tensor along one dimension of the mesh, and then do collectives along that dimension when there are nontrivial dependencies between shards. Being able to explain why a device mesh is set up the way it is for a collection of parallelization strategies is a good check for seeing if you understand how the parallelization strategies work in the first place! (Credit: This post was influenced by Visualizing 6D Mesh Parallelism.) — Read More

#training

Python: The Documentary | An origin story

Read More

#videos