Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Soccer players can tackle, get up, kick and chase a ball in one seamless motion. How could robots master these agile motor skills?

We investigated the application of Deep Reinforcement Learning (Deep RL) for low-cost, miniature humanoid hardware in a dynamic environment, showing the method can synthesize sophisticated and safe movement skills making up complex behavioral strategies in a simplified one-versus-one (1v1) soccer game.  Read More

#reinforcement-learning, #robotics

Google goes beyond ChatGPT and shocks the world

The new age of robots is upon us

It turns out, one of the most-coveted questions in AI has recently been answered.

Imagine an AI tool that is capable of playing hundreds of video games at a supreme level. And I’m not referring to a robot trained to be great at chess, or at checkers, or League of Legends.

I’m talking about a robot that’s amazing at all of them. Read More

#reinforcement-learning, #robotics

Transformers are Sample-Efficient World Models

Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games, setting a new state of the art for methods without lookahead search. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our code and models at https://github.com/eloialonso/iris.

Read More

#reinforcement-learning

Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons

We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). Our analysis shows that when the true reward function is linear, the widely used maximum likelihood >estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model. However, we show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions. Additionally, we demonstrate that under the PL model, the true MLE and an alternative MLE that splits the K-wise comparison into pairwise comparisons both converge. Moreover, the true MLE is asymptotically more efficient. Our results validate the empirical success of existing RLHF algorithms in InstructGPT and provide new insights for algorithm design. We also unify the problem of RLHF and max-entropy Inverse Reinforcement Learning (IRL), and provide the first sample complexity bound for max-entropy IRL. Read More

#reinforcement-learning

Uh oh, people are now using AI to cheat in Rocket League

A cool machine learning bot project is being exploited by cheaters, and now players are looking for ways to beat it.

I was skeptical when I came across a Reddit poster claiming they “for sure” encountered a cheater in ranked Rocket League. Uh huh, just like how everyone who kills me in Rainbow Six Siege is “for sure” aimbotting, right? Then I watched the video. Well friends, I regret to inform you that people are cheating in Rocket League.

The alleged cheater was actually on the same team as ghost_snyped, the Reddit user who posted the clip (opens in new tab) embedded above, which shows the cheater’s perspective for part of a doubles match. I’ve been playing Rocket League for seven years and I have never seen a human being play like that at any rank. There are masterful Rocket League dribblers out there, but it’d be unusual for a skilled player to stay so rooted to the field—most throw in some aerial maneuvers here and there—and to carry and flick the ball that flawlessly. 

Sure enough, this is a real problem: People have started using a machine learning-trained Rocket League bot in online matches.  Read More

#reinforcement-learning

The danger of advanced artificial intelligence controlling its own feedback

How would an artificial intelligence (AI) decide what to do? One common approach in AI research is called “reinforcement learning”.

Reinforcement learning gives the software a “reward” defined in some way, and lets the software figure out how to maximise the reward. This approach has produced some excellent results, such as building software agents that defeat humans at games like chess and Go, or creating new designs for nuclear fusion reactors.

However, we might want to hold off on making reinforcement learning agents too flexible and effective.

As we argue in a new paper in AI Magazine, deploying a sufficiently advanced reinforcement learning agent would likely be incompatible with the continued survival of humanity. Read More

#reinforcement-learning, #singularity

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

AI and reinforcement learning (RL) have improved many areas, but are not yet widely adopted in economic policy design, mechanism design, or economics at large. At the same time, current economic methodology is limited by a lack of counterfactual data, simplistic behavioral models, and limited opportunities to experiment with policies and evaluate behavioral responses. Here we show that machine-learning-based economic simulation is a powerful policy and mechanism design framework to overcome these limitations. The AI Economist is a two-level, deep RL framework that trains both agents and a social planner who co-adapt, providing a tractable solution to the highly unstable and novel two-level RL challenge. From a simple specification of an economy, we learn rational agent behaviors that adapt to learned planner policies and vice versa. We demonstrate the efficacy of the AI Economist on the problem of optimal taxation. In simple one-step economies, the AI Economist recovers the optimal tax policy of economic theory. In complex, dynamic economies, the AI Economist substantially improves both utilitarian social welfare and the trade-off between equality and productivity over baselines. It does so despite emergent tax-gaming strategies, while accounting for agent interactions and behavioral change more accurately than economic theory. These results demonstrate for the first time that two-level, deep RL can be used for understanding and as a complement to theory for economic design, unlocking a new computational learning-based approach to understanding economic policy. Read More

#reinforcement-learning

Decision Transformer: ReinforcementLearning via Sequence Modeling

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks. Read More

#reinforcement-learning

Reward is enough

In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence. Read More

#gans, #reinforcement-learning

Novel deep learning framework for symbolic regression

Lawrence Livermore National Laboratory (LLNL) computer scientists have developed a new framework and an accompanying visualization tool that leverages deep reinforcement learning for symbolic regression problems, outperforming baseline methods on benchmark problems.

The paper was recently accepted as an oral presentation at the International Conference on Learning Representations (ICLR 2021), one of the top machine learning conferences in the world. The conference takes place virtually May 3-7.

In the paper, the LLNL team describes applying deep reinforcement learning to discrete optimization — problems that deal with discrete “building blocks” that must be combined in a particular order or configuration to optimize a desired property. The team focused on a type of discrete optimization called symbolic regression — finding short mathematical expressions that fit data gathered from an experiment. Symbolic regression aims to uncover the underlying equations or dynamics of a physical process. Read More

#reinforcement-learning