An Introduction to Deep Reinforcement Learning

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine.Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques.Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts. Read More

#deep-learning, #reinforcement-learning

Solving Rubik’s Cube with a Robot Hand

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik’s cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/ Read More

#reinforcement-learning, #robotics

DeepMind Has Quietly Open Sourced Three New Impressive Reinforcement Learning Frameworks

Deep reinforcement learning(DRL) has been at the center of some of the biggest breakthroughs of artificial intelligence(AI) in the last few years. However, despite all its progress, DRL methods remain incredibly difficult to apply in mainstream solutions given the lack of tooling and libraries. Consequently, DRL remains mostly a research activity that hasn’t seen a lot of adoption into real world machine learning solutions. Addressing that problem requires better tools and frameworks. Among the current generation of artificial intelligence(AI) leaders, DeepMind stands alone as the company that has done the most to advance DRL research and development. Recently, the Alphabet subsidiary has been releasing a series of new open source technologies that can help to streamline the adoption of DRL methods. Read More

#reinforcement-learning

OpenSpiel: A Framework for Reinforcement Learning in Games

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully-observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning,computational game theory, and search. Read More

#reinforcement-learning

How teaching AI to be curious helps machines learn for themselves

When playing a video game, what motivates you to carry on?

This question is perhaps too broad to yield a single answer, but if you had to sum up why you accept that next quest, jump into a new level, or cave and playjust one more turn, the simplest explanation might be “curiosity” — just to see what happens next. And as it turns out, curiosity is a very effective motivator when teaching AI to play video games, too.IN A GAME WITHOUT REWARDS, TEACHING AI IS DIFFICULT

Research published this week by artificial intelligence lab OpenAI explains how an AI agent with a sense of curiosity outperformed its predecessors playing the classic 1984 Atari game Montezuma’s Revenge. Read More

#reinforcement-learning

Deepmind’s losses and the future of Artificial Intelligence

ALPHABET’S DEEPMIND LOST $572 million last year. What does it mean?

DeepMind, likely the world’s largest research-focused artificial intelligence operation, is losing a lot of money fast, more than $1 billion in the past three years. DeepMind also has more than $1 billion in debt due in the next 12 months.

Does this mean that AI is falling apart? Read More

#artificial-intelligence, #reinforcement-learning

Inside DeepMind's epic mission to solve science's trickiest problem

DeepMind is best known for its breakthroughs in machine learning and deep learning that have resulted in highly publicised events in which neural networks combined with algorithms have mastered computer games, beaten chess grandmasters and caused Lee Sedol, the world champion of Go – widely agreed to be the most complex game man has created – to declare: “From the beginning of the game, there was not a moment in time when I thought that I was winning.”

For Demis Hassabis,Shane Legg, and Mustafa Suleyman, the proof points offered by gameplay will define the next ten years: namely, to use data and machine learning to solve some of the hardest problems in science. Read More

#deep-learning, #reinforcement-learning, #strategy

Hierarchical Imitation and Reinforcement Learning

We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction. Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration. Using long-horizon benchmarks, including Montezuma’s Revenge, we demonstrate that our approach can learn significantly faster than hierarchical RL, and be significantly more label-efficient than standard IL. We also theoretically analyze labeling cost for certain instantiations of our framework. Read More

#human, #observational-learning, #reinforcement-learning

RL — Imitation Learning

Imitation is a key part in the human learning. In the high-tech world, if you are not an innovator, you want to be a quick follower. In reinforcement learning, we maximize the rewards for our actions. Model-based RL focuses on the model (the system dynamics) to optimize our decisions while Policy Gradient methods improve the policy for better rewards.

On the other hand, Imitation learning focuses on imitating expert demonstrations. Read More

#human, #observational-learning, #reinforcement-learning

Observational Learning by Reinforcement Learning

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a ’teacher’ (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent’s behaviour. The later is naturally modeled by RL, by correlating the learning agent’s reward with the teacher agent’s behaviour. Read More

#human, #observational-learning, #reinforcement-learning