For years I have been warning that “scaling” — eeking out improvements in AI by adding more data and more compute, without making fundamental architectural changes — would not continue forever. In my most notorious article, in March of 2022, I argued that “deep learning is hitting a wall”. Central to the argument was that pure scaling would not solve hallucinations or abstraction; I concluded that “there are serious holes in the scaling argument.”
And I got endless grief for it. Sam Altman implied (without saying my name, but riffing on the images in my then-recent article) I was a “mediocre deep learning skeptic”; Greg Brockman openly mocked the title. Yann LeCun wrote that deep learning wasn’t hitting a wall, and so on. Elon Musk himself made fun of me and the title earlier this year.
The thing is, in the long term, science isn’t majority rule. In the end, the truth generally outs. Alchemy had a good run, but it got replaced by chemistry. The truth is that scaling is running out, and that truth is, at last coming out. — Read More
Tag Archives: Strategy
You could start smelling the roses from far away using AI
AI can “teleport” scents without human hands (or noses)
Ever send a picture of yourself trying on clothes to a friend to see what they think of how you look? Now, imagine doing the same from the perfume and cologne counter. AI could make that happen in the not-too-distant future after a breakthrough in ‘Scent Teleportation.’ Osmo, which bills itself as a “digital olfaction” company, has succeeded in using AI to analyze a scent in one location and reproduce it elsewhere without human intervention. — Read More
Bots, agents, and digital workers: AI is changing the very definition of work
Imagine a world where your digital colleague handles entire workflows, adapts to real-time challenges, and collaborates seamlessly with your human team. This isn’t science fiction—it’s the imminent reality of AI agents in the workplace.
As Sam Altman, CEO of OpenAI, boldly predicted at their annual DevDay event, “2025 is when AI agents will work.” But what does this mean for the future of human labor, organizational structures, and the very definition of work itself?
According to research by The Conference Board, 56% of workers use generative AI on the job, and nearly 1 in 10 use generative AI tools daily. — Read More
Anthropic CEO goes full techno-optimist in 15,000-word paean to AI
Anthropic CEO Dario Amodei wants you to know he’s not an AI “doomer.”
At least, that’s my read of the “mic drop” of a ~15,000 word essay Amodei published to his blog late Friday. (I tried asking Anthropic’s Claude chatbot whether it concurred, but alas, the post exceeded the free plan’s length limit.)
In broad strokes, Amodei paints a picture of a world in which all AI risks are mitigated, and the tech delivers heretofore unrealized prosperity, social uplift, and abundance. — Read More
AI will use a lot of energy. That’s good for the climate.
If you asked me how to scale clean energy, I would prescribe a magical source of urgent energy demand.
Someone willing to pay a premium to build solar+batteries, geothermal, and nuclear, in order to bring them down the cost curve and make them cheaper for everyone.
That is exactly what AI data centres are. — Read More
LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench
The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities. PlanBench, an extensible benchmark we developed in 2022, soon after the release of GPT3, has remained an important tool for evaluating the planning abilities of LLMs. Despite the slew of new private and open source LLMs since GPT3, progress on this benchmark has been surprisingly slow. OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs–making it a new kind of model: a Large Reasoning Model (LRM). Using this development as a catalyst, this paper takes a comprehensive look at how well current LLMs and new LRMs do on PlanBench. As we shall see, while o1’s performance is a quantum improvement on the benchmark, outpacing the competition, it is still far from saturating it. This improvement also brings to the fore questions about accuracy, efficiency, and guarantees which must be considered before deploying such systems. — Read More
The Bitter Lesson
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage
computation are ultimately the most effective, and by a large margin. The ultimate reason for this is
Moore’s law, or rather its generalization of continued exponentially falling cost per unit of
computation. Most AI research has been conducted as if the computation available to the agent were
constant (in which case leveraging human knowledge would be one of the only ways to improve
performance) but, over a slightly longer time than a typical research project, massively more
computation inevitably becomes available. Seeking an improvement that makes a difference in the
shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing
that matters in the long run is the leveraging of computation. These two need not run counter to each
other, but in practice they tend to. Time spent on one is time not spent on the other. There are
psychological commitments to investment in one approach or the other. And the human-knowledge
approach tends to complicate methods in ways that make them less suited to taking advantage of
general methods leveraging computation. There were many examples of AI researchers’ belated
learning of this bitter lesson, and it is instructive to review some of the most prominent. — Read More
Time100/AI
…Our purpose in creating the TIME100 AI is to put leaders like [Sundar] Pichai and [Meredith] Whittaker in dialogue and to open up their views to TIME’s readers. That is why we are excited to share with you the second edition of the TIME100 AI. We built this program in the spirit of the TIME100, the world’s most influential community. TIME’s knowledgeable editors and correspondents, led by Emma Barker and Ayesha Javed, interviewed their sources and consulted members of last year’s list to find the best new additions to our community of AI leaders. Ninety-one of the members of the 2024 list were not on last year’s, an indication of just how quickly this field is changing. They span dozens of companies, regions, and perspectives, including 15-year-old Francesca Mani, who advocates across the U.S. for protections for victims of deepfakes, and 77-year-old Andrew Yao, one of China’s most prominent computer scientists, who called last fall for an international regulatory body for AI. — Read More
Superhuman Automated Forecasting (FiveThirtyNine)
In a recent appearance on Conversations with Tyler, famed political forecaster Nate Silver expressed skepticism about AIs replacing human forecasters in the near future. When asked how long it might take for AIs to reach superhuman forecasting abilities, Silver replied: “15 or 20 [years].”
In light of this, we are excited to announce “FiveThirtyNine,” a superhuman AI forecasting bot. Our bot, built on GPT-4o, provides probabilities for any user-entered query, including “Will Trump win the 2024 presidential election?” and “Will China invade Taiwan by 2030?” Our bot performs better than experienced human forecasters and performs roughly the same as (and sometimes even better than) crowds of experienced forecasters; since crowds are for the most part superhuman, so is FiveThirtyNine. — Read More
AI worse than humans in every way at summarising information, government trial finds
Artificial intelligence is worse than humans in every way at summarising documents and might actually create additional work for people, a government trial of the technology has found.
Amazon conducted the test earlier this year for Australia’s corporate regulator the Securities and Investments Commission (ASIC) using submissions made to an inquiry. The outcome of the trial was revealed in an answer to a questions on notice at the Senate select committee on adopting artificial intelligence.
… [R]eviewers overwhelmingly found that the human summaries beat out their AI competitors on every criteria and on every submission, scoring an 81% on an internal rubric compared with the machine’s 47%. — Read More