Rick's Cafe AI 8:30 pm on July 19, 2019
Tags: Human ( 310 ), Observational Learning ( 10 ), Reinforcement Learning

Observational Learning by Reinforcement Learning

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a ’teacher’ (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent’s behaviour. The later is naturally modeled by RL, by correlating the learning agent’s reward with the teacher agent’s behaviour. Read More

#human, #observational-learning, #reinforcement-learning

Rick's Cafe AI 8:27 pm on July 19, 2019
Tags: Human ( 310 ), Observational Learning ( 10 ), Reinforcement Learning

Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, object positions and types, and other factors. We term this kind of imitation learning “imitation-from-observation,” and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts the assumption in imitation learning that the demonstration should consist of observations in the same environment configuration, and enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Our experimental results show the effectiveness of our approach in learning a wide range of real-world robotic tasks modeled after common household chores from videos of a human demonstrator, including sweeping, ladling almonds, pushing objects as well as a number of tasks in simulation. Read More

#human, #observational-learning, #reinforcement-learning

Rick's Cafe AI 8:59 pm on June 20, 2019
Tags: GANS ( 71 ), Human ( 310 ), Reinforcement Learning

World Models

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. Read More

#gans, #human, #reinforcement-learning

Rick's Cafe AI 8:54 pm on June 20, 2019
Tags: AutoML ( 18 ), Reinforcement Learning, Videos ( 364 )

This AI Learns From Its Dreams

Rick's Cafe AI 7:12 pm on June 20, 2019
Tags: AutoML ( 18 ), Reinforcement Learning

Off-Policy Classification – A New Reinforcement Learning Model Selection Method

Reinforcement learning (RL) is a framework that lets agents learn decision making from experience. One of the many variants of RL is off-policy RL, where an agent is trained using a combination of data collected by other agents (off-policy data) and data it collects itself to learn generalizable skills like robotic walking and grasping. In contrast, fully off-policy RL is a variant in which an agent learns entirely from older data, which is appealing because it enables model iteration without requiring a physical robot. With fully off-policy RL, one can train several models on the same fixed dataset collected by previous agents, then select the best one. However, fully off-policy RL comes with a catch: while training can occur without a real robot, evaluation of the models cannot. Furthermore, ground-truth evaluation with a physical robot is too inefficient to test promising approaches that require evaluating a large number of models, such as automated architecture search with AutoML. Read More

#automl, #reinforcement-learning

Rick's Cafe AI 7:09 pm on June 20, 2019
Tags: AutoML ( 18 ), Reinforcement Learning

Off-Policy Evaluation via Off-Policy Classification

In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments. Typically, the performance of deep RL algorithms is evaluated via on-policy interactions with the target environment. However, comparing models in a real-world environment for the purposes of early stopping or hyperparameter tuning is costly and often practically infeasible. This leads us to examine off-policy policy evaluation (OPE) in such settings. We focus on OPE for value-based methods, which are of particular interest in deep RL, with applications like robotics, where off-policy algorithms based on Q-function estimation can often attain better sample complexity than direct policy optimization. Existing OPE metrics either rely on a model of the environment, or the use of importance sampling (IS) to correct for the data being off-policy. However, for high-dimensional observations, such as images, models of the environment can be difficult to fit and value-based methods can make IS hard to use or even illconditioned, especially when dealing with continuous action spaces. In this paper, we focus on the specific case of MDPs with continuous action spaces and sparse binary rewards, which is representative of many important real-world applications. We propose an alternative metric that relies on neither models nor IS, by framing OPE as a positive-unlabeled (PU) classification problem with the Q-function as the decision function. We experimentally show that this metric outperforms baselines on a number of tasks. Most importantly, it can reliably predict the relative performance of different policies in a number of generalization scenarios, including the transfer to the real-world of policies trained in simulation for an image-based robotic manipulation task. Read More

#automl, #reinforcement-learning

Rick's Cafe AI 12:52 pm on June 20, 2019
Tags: Human ( 310 ), Reinforcement Learning

AlphaStar: An Evolutionary Computation Perspective

In January 2019, DeepMind revealed AlphaStar to the world—the first artificial intelligence (AI) system to beat a professional player at the game of StarCraft II—representing a milestone in the progress of AI. AlphaStar draws on many areas of AI research, including deep learning, reinforcement learning, game theory, and evolutionary computation (EC). In this paper we analyze AlphaStar primarily through the lens of EC, presenting a new look at the system and relating it to many concepts in the field. We highlight some of its most interesting aspects—the use of Lamarckian evolution,competitive co-evolution, and quality diversity. In doing so,we hope to provide a bridge between the wider EC community and one of the most significant AI systems developed in recent times. Read More

#human, #reinforcement-learning

Rick's Cafe AI 12:09 pm on June 20, 2019
Tags: Deep Learning ( 51 ), Human ( 310 ), Reinforcement Learning, Videos ( 364 )

The Power of Self-Learning Systems

Rick's Cafe AI 12:03 pm on June 20, 2019
Tags: Deep Learning ( 51 ), Human ( 310 ), Reinforcement Learning, Videos ( 364 )

AI Codes its Own ‘AI Child’ – AutoML

Rick's Cafe AI 3:40 pm on May 31, 2019
Tags: Reinforcement Learning

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research. Read More

#reinforcement-learning

Recent Activity

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Rick's Cafe AI

The latest in Artificial Intelligence carefully curated into its own special blend

Tag Archives: Reinforcement Learning