Every Abstraction Is a Door and a Wall: The Hidden Law of Abstraction

TL;DR: Virtualization emerges as the strategy to increase efficiency and achieve feats that physical reality never could. To the point where even our work, friends, and experiences have gone virtual. But what’s the real cost of living in abstractions — and could reality itself be just another layer we can’t see through?

A July 2025 MIT study examined how large language models (LLMs) handle complex, changing information. Researchers tasked AI models with predicting the final arrangement of scrambled digits after a series of moves, without knowing the final result. Transformer models learned to skip explicit simulation of every move. Instead of following state changes step by step, the models organized them into hierarchies, eventually making reasonable predictions.

In other words, the AI developed its own internal “language” of shortcuts to solve the task more efficiently. Does it hint at a broader truth? When faced with complexity, intelligent systems (biological or artificial) seek compressed, virtual representations that capture the essence without expending the energy to simulate every detail. — Read More

#strategy

Google and Grok are catching up to ChatGPT, says a16z’s latest AI report

ChatGPT rivals like Google’s Gemini, xAI’s Grok, and, to a lesser extent, Meta AI, are closing the gap to ChatGPT, OpenAI’s popular AI chatbot, according to a new report focused on the consumer AI landscape from venture firm Andreessen Horowitz.

The report, in its fifth iteration, showcases two and a half years of data about consumers’ evolving use of AI products.

And for the fifth time, 14 companies appeared on the list of top AI products: ChatGPT, Perplexity, Poe, Character AI, Midjourney, Leonardo, Veed, Cutout, ElevenLabs, Photoroom, Gamma, QuillBot, Civitai, and Hugging Face. — Read More

#strategy

TIME100 AI 2025

Meet the innovators, leaders, and thinkers reshaping our world through groundbreaking advances in artificial intelligence. Time’s 100 most influential people in AI of 2025. The list includes familiar names like Sam Altman, Elon Musk, Jensen Huang, and Fei-Fei Li alongside newcomers like DeepSeek CEO Liang Wenfeng. — Read More

#strategy

Mass Intelligence

More than a billion people use AI chatbots regularly. ChatGPT has over 700 million weekly users. Gemini and other leading AIs add hundreds of millions more. In my posts, I often focus on the advances that AI is making (for example, in the past few weeks, both OpenAI and Google AIs chatbots got gold medals in the International Math Olympiad), but that obscures a broader shift that’s been building: we’re entering an era of Mass Intelligence, where powerful AI is becoming as accessible as a Google search.

Until recently, free users of these systems (the overwhelming majority) had access only to older, smaller AI models that frequently made mistakes and had limited use for complex work. The best models, like Reasoners that can solve very hard problems and hallucinate much less often, required paying somewhere between $20 and $200 a month. And even then, you needed to know which model to pick and how to prompt it properly. But the economics and interfaces are changing rapidly, with fairly large consequences for how all of us work, learn, and think. — Read More

#surveillance

Building Agents for Small Language Models: A Deep Dive into Lightweight AI

The landscape of AI agents has been dominated by large language models (LLMs) like GPT-4 and Claude, but a new frontier is opening up: lightweight, open-source, locally-deployable agents that can run on consumer hardware. This post shares internal notes and discoveries from my journey building agents for small language models (SLMs) – models ranging from 270M to 32B parameters that run efficiently on CPUs or modest GPUs. These are lessons learned from hands-on experimentation, debugging, and optimizing inference pipelines.

SLMs offer immense potential: privacy through local deployment, predictable costs, and full control thanks to open weights. However, they also present unique challenges that demand a shift in how we design agent architectures. — Read More

#strategy

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

We introduce InternVL 3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0\% gain in overall reasoning performance and a 4.05x inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks — narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released. — Read More

#multi-modal

DINOv3: Self-supervised learning for vision at unprecedented scale

Self-supervised learning (SSL) —the concept that AI models can learn independently without human supervision—has emerged as the dominant paradigm in modern machine learning. It has driven the rise of large language models that acquire universal representations by pre-training on massive text corpora. However, progress in computer vision has lagged behind, as the most powerful image encoding models still rely heavily on human-generated metadata, such as web captions, for training.

Today, we’re releasing DINOv3, a generalist, state-of-the-art computer vision model trained with SSL that produces superior high-resolution visual features. For the first time, a single frozen vision backbone outperforms specialized solutions on multiple long-standing dense prediction tasks including object detection and semantic segmentation. — Read More

#image-recognition

China unveils bionic antelope robot to observe endangered Tibetan species

A lifelike robotic Tibetan antelope is now roaming the high-altitude wilderness of Hoh Xil National Nature Reserve in Northwest China’s Qinghai Province.

Equipped with 5G ultra-low latency networks and advanced artificial intelligence (AI) algorithms, the bionic robot is being used to collect real-time data on Tibetan antelope populations without disturbing them.

This is the first time such a robotic antelope has been deployed in the heart of Hoh Xil, which sits more than 15,092 feet (4,600 meters) above sea level. — Read More

#robotics

CodeMonkeys: Scaling Test-Time Compute for Software Engineering

Scaling test-time compute is a promising axis for improving LLM capabilities. However, test-time compute can be scaled in a variety of ways, and effectively combining different approaches remains an active area of research. Here, we explore this problem in the context of solving real-world GitHub issues from the SWE-bench dataset. Our system, named CodeMonkeys, allows models to iteratively edit a codebase by jointly generating and running a testing script alongside their draft edit. We sample many of these multi-turn trajectories for every issue to generate a collection of candidate edits. This approach lets us scale “serial” test-time compute by increasing the number of iterations per trajectory and “parallel” test-time compute by increasing the number of trajectories per problem. With parallel scaling, we can amortize up-front costs across multiple downstream samples, allowing us to identify relevant codebase context using the simple method of letting an LLM read every file. In order to select between candidate edits, we combine voting using model-generated tests with a final multi-turn trajectory dedicated to selection. Overall, CodeMonkeys resolves 57.4% of issues from SWE-bench Verified using a budget of approximately 2300 USD. Our selection method can also be used to combine candidates from different sources. Selecting over an ensemble of edits from existing top SWE-bench Verified submissions obtains a score of 66.2% and outperforms the best member of the ensemble on its own. We fully release our code and data at https://scalingintelligence.stanford.edu/pubs/codemonkeys/. — Read More

#performance

“RAG is Dead, Context Engineering is King” — with Jeff Huber of Chroma

In December 2023, we first covered The Four Wars of AI and the RAG/Ops War. After tens of millions poured into vector databases, ups and downs in the hype cycle, we finally have Jeff Huber from Chroma joining us today for the new hot take: “RAG” is dead…

and as context lengths increase, and more and more AI workloads are shifting from simple chatbots to IMPACTful agents, new work from thoughtleaders like Lance Martin and Dex Horthy are making genuine contributions of substance to the previously underrated context box. — Read More

#nlp