Large language models aren’t people. Let’s stop testing them as if they were.

When Taylor Webb played around with GPT-3 in early 2022, he was blown away by what OpenAI’s large language model appeared to be able to do. Here was a neural network trained only to predict the next word in a block of text—a jumped-up autocomplete. And yet it gave correct answers to many of the abstract problems that Webb set for it—the kind of thing you’d find in an IQ test. “I was really shocked by its ability to solve these problems,” he says. “It completely upended everything I would have predicted.”

Webb is a psychologist at the University of California, Los Angeles, who studies the different ways people and computers solve abstract problems. He was used to building neural networks that had specific reasoning capabilities bolted on. But GPT-3 seemed to have learned them for free.

… What Webb’s research highlights is only the latest in a long string of remarkable tricks pulled off by large language models.

… These kinds of results are feeding a hype machine predicting that these machines will soon come for white-collar jobs, replacing teachers, doctors, journalists, and lawyers. …But there’s a problem: there is little agreement on what those results really mean.  — Read More

#human