There’s a new global news network launching in 2024 which completely ditches humans for AI-generated newsreaders – and they’re showing off some superhuman capabilities that make it very clear: the days of the human news presenter are numbered.
Channel 1’s photorealistic news anchors come in all shapes and sizes. They can all speak more or less any language, while evoking the stiff, formal body language familiar to anyone that still watches news on the TV. They’re even capable of making news-anchor-grade attempts at humor. – Read More
Daily Archives: December 14, 2023
ChatGPT users complain the AI is getting lazy and sassy
OpenAI says it is investigating complaints about ChatGPT having become “lazy”.
In recent days, more and more users of the latest version of ChatGPT – built on OpenAI’s GPT-4 model – have complained that the chatbot refuses to do as people ask, or that it does not seem interested in answering their queries.
If the person asks for a piece of code, for instance, it might just give a little information and then instruct users to fill in the rest. Some complained that it did so in a particularly sassy way, telling people that they are perfectly able to do the work themselves, for instance. – Read More
Model alignment protects against accidental harms, not intentional ones
Preventing harms from AI is important. The AI safety community calls this the alignment problem. The vast majority of development effort to date has been on technical methods that modify models themselves. We’ll call this model alignment, as opposed to sociotechnical ways to mitigate harm.
The main model alignment technique today is Reinforcement Learning with Human Feedback (RLHF), which has proven essential to the commercial success of chatbots. But RLHF has come to be seen as a catch-all solution to the dizzying variety of harms from language models. Consequently, there is much hand-wringing about the fact that adversaries can bypass it. Alignment techniques aren’t keeping up with progress in AI capabilities, the argument goes, so we should take drastic steps, such as “pausing” AI, to avoid catastrophe.
In this essay, we analyze why RLHF has been so useful. In short, its strength is in preventing accidental harms to everyday users. Then, we turn to its weaknesses. We argue that (1) despite its limitations, RLHF continues to be effective in protecting against casual adversaries (2) the fact that skilled and well-resourced adversaries can defeat it is irrelevant, because model alignment is not a viable strategy against such adversaries in the first place. To defend against catastrophic risks, we must look elsewhere. – Read More