We explore language models that recursively call themselves or other LLMs before providing a final answer. Our goal is to enable the processing of essentially unbounded input context length and output length and to mitigate degradation “context rot”.
We propose Recursive Language Models, or RLMs, a general inference strategy where language models can decompose and recursively interact with their input context as a variable. We design a specific instantiation of this where GPT-5 or GPT-5-mini is queried in a Python REPL environment that stores the user’s prompt in a variable. — Read More
Recent Updates Page 16
The Art of Scaling Reinforcement Learning Compute for LLMs
Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000 GPU-hours, that defines a principled framework for analyzing and predicting RL scaling in LLMs. We fit sigmoidal compute-performance curves for RL training and ablate a wide range of common design choices to analyze their effects on asymptotic performance and compute efficiency. We observe: (1) Not all recipes yield similar asymptotic performance, (2) Details such as loss aggregation, normalization, curriculum, and off-policy algorithm primarily modulate compute efficiency without materially shifting the asymptote, and (3) Stable, scalable recipes follow predictable scaling trajectories, enabling extrapolation from smaller-scale runs. Combining these insights, we propose a best-practice recipe, ScaleRL, and demonstrate its effectiveness by successfully scaling and predicting validation performance on a single RL run scaled up to 100,000 GPU-hours. Our work provides both a scientific framework for analyzing scaling in RL and a practical recipe that brings RL training closer to the predictability long achieved in pre-training. — Read More
Nation-state hackers deliver malware from “bulletproof” blockchains
Hacking groups—at least one of which works on behalf of the North Korean government—have found a new and inexpensive way to distribute malware from “bulletproof” hosts: stashing them on public cryptocurrency blockchains.
In a Thursday post, members of the Google Threat Intelligence Group said the technique provides the hackers with their own “bulletproof” host, a term that describes cloud platforms that are largely immune from takedowns by law enforcement and pressure from security researchers. More traditionally, these hosts are located in countries without treaties agreeing to enforce criminal laws from the US and other nations. These services often charge hefty sums and cater to criminals spreading malware or peddling child sexual abuse material and wares sold in crime-based flea markets. — Read More
Google’s URL Context Grounding: Another Nail in RAG’s Coffin?
Google’s hot streak in AI-related releases continues unabated. Just a few days ago, it released a new tool for Gemini called URL context grounding.
URL context grounding can be used stand-alone or combined with Google search grounding to conduct deep dives into internet content.
In a nutshell, it’s a way to programmatically have Gemini read, understand and answer questions about content and data contained in individual web URLs (including those pointing to PDFs) without the need to perform what we know as traditional RAG processing. — Read More
Systems Thinking for Scaling Responsible Multi-Agent Architectures
Nimisha Asthagiri explains the critical need for responsible AI in complex multi-agent systems. She shares practical techniques for engineering leaders and architects, applying systems thinking and Causal Flow Diagrams. She shows how these methods help predict and mitigate the unintended consequences and structural risks inherent in autonomous, learning agents, using a scheduler agent example. — Read More
Diffusion Transformers with Representation Autoencoders
Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved. Most DiTs continue to rely on the original VAE encoder, which introduces several limitations: outdated backbones that compromise architectural simplicity, low-dimensional latent spaces that restrict information capacity, and weak representations that result from purely reconstruction-based training and ultimately limit generative quality. In this work, we explore replacing the VAE with pretrained representation encoders (e.g., DINO, SigLIP, MAE) paired with trained decoders, forming what we term Representation Autoencoders (RAEs).
These models provide both high-quality reconstructions and semantically rich latent spaces, while allowing for a scalable transformer-based architecture. Since these latent spaces are typically high-dimensional, a key challenge is enabling diffusion transformers to operate effectively within them. We analyze the sources of this difficulty, propose theoretically motivated solutions, and validate them empirically. — Read More
Why Signal’s post-quantum makeover is an amazing engineering achievement
The encryption protecting communications against criminal and nation-state snooping is under threat. As private industry and governments get closer to building useful quantum computers, the algorithms protecting Bitcoin wallets, encrypted web visits, and other sensitive secrets will be useless. No one doubts the day will come, but as the now-common joke in cryptography circles observes, experts have been forecasting this cryptocalypse will arrive in the next 15 to 30 years for the past 30 years.
The uncertainty has created something of an existential dilemma: Should network architects spend the billions of dollars required to wean themselves off quantum-vulnerable algorithms now, or should they prioritize their limited security budgets fighting more immediate threats such as ransomware and espionage attacks? Given the expense and no clear deadline, it’s little wonder that less than half of all TLS connections made inside the Cloudflare network and only 18 percent of Fortune 500 networks support quantum-resistant TLS connections. It’s all but certain that many fewer organizations still are supporting quantum-ready encryption in less prominent protocols. — Read More
Google DeepMind CEO: We Want To Build A Virtual Cell
Microsoft AI announces first image generator created in-house
Microsoft AI just announced its first text-to-image generator, MAI-Image-1, designed and developed in-house. The tech giant, which recently announced its first in-house Microsoft AI models, called the new image generator “the next step on our journey.”
Microsoft says it sought feedback from creative professionals in order to avoid “repetitive or generically-stylized outputs.” MAI-Image-1 “excels” at photorealistic imagery like lightning, landscapes, and more, the company claims. And it can process requests and produce images faster than “larger, slower models.” The model has already secured a spot in the top 10 of LMArena, the AI benchmark site where humans compare outputs from different systems and vote on the best one. — Read More
InferenceMAX™: Open Source Inference Benchmarking
LLM Inference performance is driven by two pillars, hardware and software. While hardware innovation drives step jumps in performance every year through the release of new GPUs/XPUs and new systems, software evolves every single day, delivering continuous performance gains on top of these step jumps.
… [The] pace of software advancement creates a challenge: benchmarks conducted at a fixed point in time quickly go stale and do not represent the performance that can be achieved with the latest software packages.
InferenceMAX™, an open-source automated benchmark designed to move at the same rapid speed as the software ecosystem itself, is built to address this challenge. — Read More