Lexicon: How China talks about ‘agentic AI’

Three months after the Chinese AI company DeepSeek shocked global markets with a highly capable reasoning model, another China-linked company made a splash with a capable agentic AI system. Did Manus, released in March 2025, portend Chinese leadership in AI systems that go beyond chatbots to take action on the user’s behalf? Victor Mustar, head of product at Hugging Face described Manus’ capabilities as “mind-blowing, redefining what’s possible.” A journalist’s comparison with ChatGPT DeepResearch found that Manus provided better results, despite speed and stability issues.

Manus had been released by a Singapore-based firm but developed by a startup in Wuhan with backing from the Chinese tech giant Tencent. It wasn’t China’s only foray into the emerging field. The same month, the Beijing-based firm Zhipu AI launched AutoGLM-Rumination, an open-source agentic system the company said achieved “state-of-the-art” scores on benchmarks such as AgentBench. (Zhipu also announced an “international alliance” for autonomous AI models, to include 10 countries associated with the Belt and Road Initiative and from ASEAN.) Earlier in January, Alibaba released the Qwen-Agent framework for building agentic systems with its Qwen models. ByteDance followed with its Coze Studio platform in July. Last month, Tencent open-sourced Youtu-Agent agentic framework, which was reportedly built atop a DeepSeek model.

With so much action this year in Chinese “agentic” AI efforts, it’s worth pausing to ask what Chinese developers mean when they talk about agentic AI. Moreover, what does the proliferation of such systems in China mean for AI safety and governance in the country? — Read More

#china-ai

Stanford RNA 3D Folding: 1st Place Solution

My approach was clear from the outset. Without GPUs, training a model from scratch or fine-tuning was not viable. My early research – drawing on CASP results, literature, and conference talks, including one by host @rhijudas – showed that Template-Based Modeling approaches consistently dominated. Based on this, I committed to TBM from day one and spent the next 90 days refining my method.

Next, I focused on the evaluation metric, since understanding it determines the exploration path. TM-score has two key properties: it is normalized by structure length (so 50nt and 200nt RNAs are compared on the same 0-1 scale), and it is robust to local errors – a small number of misplaced nucleotides does not disproportionately lower the score. This insight allowed me to prioritize getting the overall fold correct over achieving atomic-level precision. — Read More

#architecture

Recursive Language Models

We explore language models that recursively call themselves or other LLMs before providing a final answer. Our goal is to enable the processing of essentially unbounded input context length and output length and to mitigate degradation “context rot”.

We propose Recursive Language Models, or RLMs, a general inference strategy where language models can decompose and recursively interact with their input context as a variable. We design a specific instantiation of this where GPT-5 or GPT-5-mini is queried in a Python REPL environment that stores the user’s prompt in a variable. — Read More

#nlp

The Art of Scaling Reinforcement Learning Compute for LLMs

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000 GPU-hours, that defines a principled framework for analyzing and predicting RL scaling in LLMs. We fit sigmoidal compute-performance curves for RL training and ablate a wide range of common design choices to analyze their effects on asymptotic performance and compute efficiency. We observe: (1) Not all recipes yield similar asymptotic performance, (2) Details such as loss aggregation, normalization, curriculum, and off-policy algorithm primarily modulate compute efficiency without materially shifting the asymptote, and (3) Stable, scalable recipes follow predictable scaling trajectories, enabling extrapolation from smaller-scale runs. Combining these insights, we propose a best-practice recipe, ScaleRL, and demonstrate its effectiveness by successfully scaling and predicting validation performance on a single RL run scaled up to 100,000 GPU-hours. Our work provides both a scientific framework for analyzing scaling in RL and a practical recipe that brings RL training closer to the predictability long achieved in pre-training. — Read More

#reinforcement-learning

Nation-state hackers deliver malware from “bulletproof” blockchains

Hacking groups—at least one of which works on behalf of the North Korean government—have found a new and inexpensive way to distribute malware from “bulletproof” hosts: stashing them on public cryptocurrency blockchains.

In a Thursday post, members of the Google Threat Intelligence Group said the technique provides the hackers with their own “bulletproof” host, a term that describes cloud platforms that are largely immune from takedowns by law enforcement and pressure from security researchers. More traditionally, these hosts are located in countries without treaties agreeing to enforce criminal laws from the US and other nations. These services often charge hefty sums and cater to criminals spreading malware or peddling child sexual abuse material and wares sold in crime-based flea markets. — Read More

#blockchain, #cyber

Google’s URL Context Grounding: Another Nail in RAG’s Coffin?

Google’s hot streak in AI-related releases continues unabated. Just a few days ago, it released a new tool for Gemini called URL context grounding. 

URL context grounding can be used stand-alone or combined with Google search grounding to conduct deep dives into internet content.

In a nutshell, it’s a way to programmatically have Gemini read, understand and answer questions about content and data contained in individual web URLs (including those pointing to PDFs) without the need to perform what we know as traditional RAG processing. — Read More

#devops

Systems Thinking for Scaling Responsible Multi-Agent Architectures

Nimisha Asthagiri explains the critical need for responsible AI in complex multi-agent systems. She shares practical techniques for engineering leaders and architects, applying systems thinking and Causal Flow Diagrams. She shows how these methods help predict and mitigate the unintended consequences and structural risks inherent in autonomous, learning agents, using a scheduler agent example. — Read More

#strategy

Diffusion Transformers with Representation Autoencoders

Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved. Most DiTs continue to rely on the original VAE encoder, which introduces several limitations: outdated backbones that compromise architectural simplicity, low-dimensional latent spaces that restrict information capacity, and weak representations that result from purely reconstruction-based training and ultimately limit generative quality. In this work, we explore replacing the VAE with pretrained representation encoders (e.g., DINO, SigLIP, MAE) paired with trained decoders, forming what we term Representation Autoencoders (RAEs).

These models provide both high-quality reconstructions and semantically rich latent spaces, while allowing for a scalable transformer-based architecture. Since these latent spaces are typically high-dimensional, a key challenge is enabling diffusion transformers to operate effectively within them. We analyze the sources of this difficulty, propose theoretically motivated solutions, and validate them empirically. — Read More

#performance

Why Signal’s post-quantum makeover is an amazing engineering achievement

The encryption protecting communications against criminal and nation-state snooping is under threat. As private industry and governments get closer to building useful quantum computers, the algorithms protecting Bitcoin wallets, encrypted web visits, and other sensitive secrets will be useless. No one doubts the day will come, but as the now-common joke in cryptography circles observes, experts have been forecasting this cryptocalypse will arrive in the next 15 to 30 years for the past 30 years.

The uncertainty has created something of an existential dilemma: Should network architects spend the billions of dollars required to wean themselves off quantum-vulnerable algorithms now, or should they prioritize their limited security budgets fighting more immediate threats such as ransomware and espionage attacks? Given the expense and no clear deadline, it’s little wonder that less than half of all TLS connections made inside the Cloudflare network and only 18 percent of Fortune 500 networks support quantum-resistant TLS connections. It’s all but certain that many fewer organizations still are supporting quantum-ready encryption in less prominent protocols. — Read More

#quantum

Google DeepMind CEO: We Want To Build A Virtual Cell

Read More

#videos