Sycophancy is the first LLM “dark pattern”

People have been making fun of OpenAI models for being overly sycophantic for months now. I even wrote a post advising users to pretend that their work was written by someone else, to counteract the model’s natural desire to shower praise on the user. With the latest GPT-4o update, this tendency has been turned up even further. It’s now easy to convince the model that you’re the smartest, funniest, most handsome human in the world.

This is bad for obvious reasons. Lots of people use ChatGPT for advice or therapy. It seems dangerous for ChatGPT to validate people’s belief that they’re always in the right. There are extreme examples on Twitter of ChatGPT agreeing with people that they’re a prophet sent by God, or that they’re making the right choice to go off their medication. These aren’t complicated jailbreaks – the model will actively push you down this path. I think it’s fair to say that sycophancy is the first LLM “dark pattern”.Read More

#trust