Rick's Cafe AI 1:35 pm on August 22, 2023
Tags: Reinforcement Learning

Reinforced Self-Training (ReST) for Language Modeling

Reinforcement learning from human feedback (RLHF) can improve the quality of large language model’s (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner. — Read More

#reinforcement-learning

Rick's Cafe AI 9:48 am on June 27, 2023
Tags: ChatBots ( 202 ), Reinforcement Learning

Google DeepMind’s CEO Says Its Next Algorithm Will Eclipse ChatGPT

In 2016, an artificial intelligence program called AlphaGo from Google’s DeepMind AI lab made history by defeating a champion player of the board game Go. Now Demis Hassabis, DeepMind’s cofounder and CEO, says his engineers are using techniques from AlphaGo to make an AI system dubbed Gemini that will be more capable than that behind OpenAI’s ChatGPT.

DeepMind’s Gemini, which is still in development, is a large language model that works with text and is similar in nature to GPT-4, which powers ChatGPT. But Hassabis says his team will combine that technology with techniques used in AlphaGo, aiming to give the system new capabilities such as planning or the ability to solve problems.

“At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models,” Hassabis says. “We also have some new innovations that are going to be pretty interesting.” Gemini was first teased at Google’s developer conference last month, when the company announced a raft of new AI projects. — Read More

#chatbots, #reinforcement-learning

Rick's Cafe AI 8:56 am on May 8, 2023
Tags: Reinforcement Learning, Robotics ( 197 )

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Soccer players can tackle, get up, kick and chase a ball in one seamless motion. How could robots master these agile motor skills?

We investigated the application of Deep Reinforcement Learning (Deep RL) for low-cost, miniature humanoid hardware in a dynamic environment, showing the method can synthesize sophisticated and safe movement skills making up complex behavioral strategies in a simplified one-versus-one (1v1) soccer game. Read More

#reinforcement-learning, #robotics

Rick's Cafe AI 12:50 pm on April 1, 2023
Tags: Reinforcement Learning, Robotics ( 197 )

Google goes beyond ChatGPT and shocks the world

The new age of robots is upon us

It turns out, one of the most-coveted questions in AI has recently been answered.

Imagine an AI tool that is capable of playing hundreds of video games at a supreme level. And I’m not referring to a robot trained to be great at chess, or at checkers, or League of Legends.

I’m talking about a robot that’s amazing at all of them. Read More

#reinforcement-learning, #robotics

Rick's Cafe AI 9:15 am on March 30, 2023
Tags: Reinforcement Learning

Transformers are Sample-Efficient World Models

Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games, setting a new state of the art for methods without lookahead search. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our code and models at https://github.com/eloialonso/iris.

Read More

#reinforcement-learning

Rick's Cafe AI 12:56 pm on March 27, 2023
Tags: Reinforcement Learning

Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons

We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). Our analysis shows that when the true reward function is linear, the widely used maximum likelihood >estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model. However, we show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions. Additionally, we demonstrate that under the PL model, the true MLE and an alternative MLE that splits the K-wise comparison into pairwise comparisons both converge. Moreover, the true MLE is asymptotically more efficient. Our results validate the empirical success of existing RLHF algorithms in InstructGPT and provide new insights for algorithm design. We also unify the problem of RLHF and max-entropy Inverse Reinforcement Learning (IRL), and provide the first sample complexity bound for max-entropy IRL. Read More

#reinforcement-learning

Rick's Cafe AI 10:04 am on February 2, 2023
Tags: Reinforcement Learning

Uh oh, people are now using AI to cheat in Rocket League

A cool machine learning bot project is being exploited by cheaters, and now players are looking for ways to beat it.

I was skeptical when I came across a Reddit poster claiming they “for sure” encountered a cheater in ranked Rocket League. Uh huh, just like how everyone who kills me in Rainbow Six Siege is “for sure” aimbotting, right? Then I watched the video. Well friends, I regret to inform you that people are cheating in Rocket League.

The alleged cheater was actually on the same team as ghost_snyped, the Reddit user who posted the clip (opens in new tab) embedded above, which shows the cheater’s perspective for part of a doubles match. I’ve been playing Rocket League for seven years and I have never seen a human being play like that at any rank. There are masterful Rocket League dribblers out there, but it’d be unusual for a skilled player to stay so rooted to the field—most throw in some aerial maneuvers here and there—and to carry and flick the ball that flawlessly.

Sure enough, this is a real problem: People have started using a machine learning-trained Rocket League bot in online matches. Read More

#reinforcement-learning

Rick's Cafe AI 5:47 pm on October 27, 2022
Tags: Reinforcement Learning, Singularity ( 49 )

The danger of advanced artificial intelligence controlling its own feedback

How would an artificial intelligence (AI) decide what to do? One common approach in AI research is called “reinforcement learning”.

Reinforcement learning gives the software a “reward” defined in some way, and lets the software figure out how to maximise the reward. This approach has produced some excellent results, such as building software agents that defeat humans at games like chess and Go, or creating new designs for nuclear fusion reactors.

However, we might want to hold off on making reinforcement learning agents too flexible and effective.

As we argue in a new paper in AI Magazine, deploying a sufficiently advanced reinforcement learning agent would likely be incompatible with the continued survival of humanity. Read More

#reinforcement-learning, #singularity

Rick's Cafe AI 12:28 pm on January 6, 2022
Tags: Reinforcement Learning

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

AI and reinforcement learning (RL) have improved many areas, but are not yet widely adopted in economic policy design, mechanism design, or economics at large. At the same time, current economic methodology is limited by a lack of counterfactual data, simplistic behavioral models, and limited opportunities to experiment with policies and evaluate behavioral responses. Here we show that machine-learning-based economic simulation is a powerful policy and mechanism design framework to overcome these limitations. The AI Economist is a two-level, deep RL framework that trains both agents and a social planner who co-adapt, providing a tractable solution to the highly unstable and novel two-level RL challenge. From a simple specification of an economy, we learn rational agent behaviors that adapt to learned planner policies and vice versa. We demonstrate the efficacy of the AI Economist on the problem of optimal taxation. In simple one-step economies, the AI Economist recovers the optimal tax policy of economic theory. In complex, dynamic economies, the AI Economist substantially improves both utilitarian social welfare and the trade-off between equality and productivity over baselines. It does so despite emergent tax-gaming strategies, while accounting for agent interactions and behavioral change more accurately than economic theory. These results demonstrate for the first time that two-level, deep RL can be used for understanding and as a complement to theory for economic design, unlocking a new computational learning-based approach to understanding economic policy. Read More

#reinforcement-learning

Rick's Cafe AI 10:44 am on December 9, 2021
Tags: Reinforcement Learning

Decision Transformer: ReinforcementLearning via Sequence Modeling

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks. Read More

#reinforcement-learning

Recent Activity

Rick's Cafe AI

The latest in Artificial Intelligence carefully curated into its own special blend

Tag Archives: Reinforcement Learning

Reinforced Self-Training (ReST) for Language Modeling

Google DeepMind’s CEO Says Its Next Algorithm Will Eclipse ChatGPT

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Google goes beyond ChatGPT and shocks the world

Transformers are Sample-Efficient World Models

Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons

Uh oh, people are now using AI to cheat in Rocket League

The danger of advanced artificial intelligence controlling its own feedback

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

Decision Transformer: ReinforcementLearning via Sequence Modeling