AI companies hit a scaling wall

OpenAI, Google and others are seeing diminishing returns to building ever-bigger models — but that may not matter as much as you would guess

Over the past week, several stories sourced to people inside the big AI labs have reported that the race to build superintelligence is hitting a wall. Specifically, they say, the approach that has carried the industry from OpenAI’s first large language model to the LLMs we have today has begun to show diminishing returns.

Today, let’s look at what everyone involved is saying — and consider what it means for the AI arms race. While reports that AI scaling laws appear to be technically accurate, they can also be easily misread. For better and for worse, it seems, the development of more powerful AI systems continues to accelerate. — Read More

#strategy

This AI-generated version of Minecraft may represent the future of real-time video generation

The game was created from clips and keyboard inputs alone, as a demo for real-time interactive video generation.

When you walk around in a version of the video game Minecraft from the AI companies Decart and Etched, it feels a little off. Sure, you can move forward, cut down a tree, and lay down a dirt block, just like in the real thing. If you turn around, though, the dirt block you just placed may have morphed into a totally new environment. That doesn’t happen in Minecraft. But this new version is entirely AI-generated, so it’s prone to hallucinations. Not a single line of code was written.

For Decart and Etched, this demo is a proof of concept. They imagine that the technology could be used for real-time generation of videos or video games more generally. “Your screen can turn into a portal—into some imaginary world that doesn’t need to be coded, that can be changed on the fly. And that’s really what we’re trying to target here,” says Dean Leitersdorf, cofounder and CEO of Decart, which came out of stealth this week. — Read More

#vfx

Google DeepMind has a new way to look inside an AI’s “mind”

AI has led to breakthroughs in drug discovery and robotics and is in the process of entirely revolutionizing how we interact with machines and the web. The only problem is we don’t know exactly how it works, or why it works so well. We have a fair idea, but the details are too complex to unpick. That’s a problem: It could lead us to deploy an AI system in a highly sensitive field like medicine without understanding that it could have critical flaws embedded in its workings.

A team at Google DeepMind that studies something called mechanistic interpretability has been working on new ways to let us peer under the hood. At the end of July, it released Gemma Scope, a tool to help researchers understand what is happening when AI is generating an output. The hope is that if we have a better understanding of what is happening inside an AI model, we’ll be able to control its outputs more effectively, leading to better AI systems in the future. — Read More

#big7

Researchers have invented a new system of logic that could boost critical thinking and AI

The rigid structures of language we once clung to with certainty are cracking. Take gender, nationality or religion: these concepts no longer sit comfortably in the stiff linguistic boxes of the last century. Simultaneously, the rise of AI presses upon us the need to understand how words relate to meaning and reasoning.

A global group of philosophers, mathematicians and computer scientists have come up with a new understanding of logic that addresses these concerns, dubbed “inferentialism”. — Read More

#strategy

Qwen2.5-Coder just changed the game for AI programming—and it’s free

Alibaba Cloud has released Qwen2.5-Coder, a new AI coding assistant that has already become the second most popular demo on Hugging Face Spaces. Early tests suggest its performance rivals GPT-4o, and it’s available to developers at no cost. — Read More

#china-ai, #devops

AI Industry is Trying to Subvert the Definition of “Open Source AI”

The Open Source Initiative has published (news article here) its definition of “open source AI,” and it’s terrible. It allows for secret training data and mechanisms. It allows for development to be done in secret. Since for a neural network, the training data is the source code—it’s how the model gets programmed—the definition makes no sense.

And it’s confusing; most “open source” AI models—like LLAMA—are open source in name only. But the OSI seems to have been co-opted by industry players that want both corporate secrecy and the “open source” label. (Here’s one rebuttal to the definition.)

This is worth fighting for. We need a public AI option, and open source—real open source—is a necessary component of that. — Read More

#strategy

Illustrated LLM OS: An Implementational Perspective

This blog post explores the implementation of large language models (LLMs) as operating systems, inspired by Andrej Karpathy’s vision of AI resembling an OS, akin to Jarvis from Iron Man. The focus is on practical considerations, proposing an application-level integration for LLMs within a terminal session. A novel approach involves injecting state machines into the decoding process, enabling real-time code execution and interaction. Additionally, this post proposes Reinforcement Learning by System Feedback (RLSF),” a reinforcement learning technique applied to code generation tasks. This method leverages a reward model to evaluate code correctness through Python subprocess execution, enhancing LLM performance. The findings contribute insights into the dynamic control of LLMs and their potential applications beyond coding tasks. — Read More

AIOS: LLM Agent Operating System
MemGPT: Towards LLMs as Operating Systems

#devops

Microsoft and a16z set aside differences, join hands in plea against AI regulation

Two of the biggest forces in two deeply intertwined tech ecosystems — large incumbents and startups — have taken a break from counting their money to jointly plead that the government desist from even pondering regulations that might affect their financial interests, or as they prefer to call them, innovation.

“Our two companies might not agree on everything, but this is not about our differences,” writes this group of vastly disparate perspectives and interests: Founding a16z partners Marc Andreessen and Ben Horowitz, and Microsoft CEO Satya Nadella and President/Chief Legal Officer Brad Smith. A truly intersectional assemblage, representing both big business and big money.

But it’s the little guys they’re supposedly looking out for.  — Read More

#strategy

CONFIRMED: LLMs have indeed reached a point of diminishing returns

For years I have been warning that “scaling” — eeking out improvements in AI by adding more data and more compute, without making fundamental architectural changes — would not continue forever. In my most notorious article, in March of 2022, I argued that “deep learning is hitting a wall”. Central to the argument was that pure scaling would not solve hallucinations or abstraction; I concluded that “there are serious holes in the scaling argument.”

And I got endless grief for it. Sam Altman implied (without saying my name, but riffing on the images in my then-recent article) I was a “mediocre deep learning skeptic”; Greg Brockman openly mocked the title. Yann LeCun wrote that deep learning wasn’t hitting a wall, and so on. Elon Musk himself made fun of me and the title earlier this year.

The thing is, in the long term, science isn’t majority rule. In the end, the truth generally outs. Alchemy had a good run, but it got replaced by chemistry. The truth is that scaling is running out, and that truth is, at last coming out. — Read More

#strategy

From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code

n our previous post, Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models, we introduced our framework for large-language-model-assisted vulnerability research and demonstrated its potential by improving the state-of-the-art performance on Meta’s CyberSecEval2 benchmarks. Since then, Naptime has evolved into Big Sleep, a collaboration between Google Project Zero and Google DeepMind.

Today, we’re excited to share the first real-world vulnerability discovered by the Big Sleep agent: an exploitable stack buffer underflow in SQLite, a widely used open source database engine. We discovered the vulnerability and reported it to the developers in early October, who fixed it on the same day. Fortunately, we found this issue before it appeared in an official release, so SQLite users were not impacted.

We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software. — Read More

#cyber