The new version of GPT-3 is much better behaved (and should be less toxic)

OpenAI has built a new version of GPT-3, its game-changing language model, that it says does away with some of the most toxic issues that plagued its predecessor. The San Francisco-based lab says the updated model, called InstructGPT, is better at following the instructions of people using it—known as “alignment” in AI jargon—and thus produces less offensive language, less misinformation, and fewer mistakes overall—unless explicitly told not to do so.

Large language models like GPT-3 are trained using vast bodies of text, much it taken from the internet, in which they encounter the best and worst of what people put down in words. That is a problem for today’s chatbots and text-generation tools. The models soak up toxic language—from text that is racist and misogynistic or that contains more insidious, baked-in prejudices—as well as falsehoods. 

OpenAI has made IntructGPT the default model for users of its application programming interface (API)—a service that gives access to the company’s language models for a fee. GPT-3 will still be available but OpenAI does not recommend using it. “It’s the first time these alignment techniques are being applied to a real product,” says Jan Leike, who co-leads OpenAI’s alignment team.

Previous attempts to tackle the problem included filtering out offensive language from the training set. But that can make models perform less well, especially in cases where the training data is already sparse, such as text from minority groups.

The OpenAI researchers have avoided this problem by starting with a fully trained GPT-3 model. They then added another round of training, using reinforcement learning to teach the model what it should say and when, based on the preferences of human users.   Read More

#nlp