Learning has been a holy grail in robotics for decades. If these systems are going to thrive in unpredictable environments, they’ll need to do more than just respond to programming — they’ll need to adapt and learn. What’s become clear the more I read and speak with experts is true robotic learning will require a combination of many solutions.
Video is an intriguing solution that’s been the centerpiece of a lot of recent work in the space. Roughly this time last year, we highlighted WHIRL (in-the-Wild Human Imitating Robot Learning), a CMU-developed algorithm designed to train robotic systems by watching a recording of a human executing a task.
This week, CMU Robotics Institute assistant professor Deepak Pathak is showcasing VRB (Vision-Robotics Bridge), an evolution to WHIRL. — Read More
Tag Archives: Observational Learning
A bot that watched 70,000 hours of Minecraft could unlock AI’s next big thing
Online videos are a vast and untapped source of training data—and OpenAI says it has a new way to use it.
OpenAI has built the best Minecraft-playing bot yet by making it watch 70,000 hours of video of people playing the popular computer game. It showcases a powerful new technique that could be used to train machines to carry out a wide range of tasks by binging on sites like YouTube, a vast and untapped source of training data.
The Minecraft AI learned to perform complicated sequences of keyboard and mouse clicks to complete tasks in the game, such as chopping down trees and crafting tools. It’s the first bot that can craft so-called diamond tools, a task that typically takes good human players 20 minutes of high-speed clicking—or around 24,000 actions.
The result is a breakthrough for a technique known as imitation learning, in which neural networks are trained to perform tasks by watching humans do them. Imitation learning can be used to train AI to control robot arms, drive cars, or navigate web pages. Read More
How neuro-symbolic AI might finally make machines reason like humans
If you want a machine to learn to do something intelligent you either have to program it or teach it to learn.
For decades, engineers have been programming machines to perform all sorts of tasks — from software that runs on your personal computer and smartphone to guidance control for space missions.
But although computers are generally much faster and more precise than the human brain at sequential tasks, such as adding numbers or calculating chess moves, such programs are very limited in their scope. Something as trivial as identifying a bicycle among a crowded pedestrian street or picking up a hot cup of coffee from a desk and gently moving it to the mouth can send a computer into convulsions, never mind conceptualizing or abstraction (such as designing a computer itself).
The gist is that humans were never programmed (not like a digital computer, at least) — humans have become intelligent through learning. Read More
Amazon Uses Self-Learning to Teach Alexa to Correct its Own Mistakes
Digital assistant such as Alexa, Siri, Cortana or the Google Assistant are some of the best examples of mainstream adoption of artificial intelligence(AI) technologies. These assistants are getting more prevalent and tackling new domain-specific tasks which makes the maintenance of their underlying AI particularly challenging. The traditional approach to build digital assistant has been based on natural language understanding(NLU) and automatic speech recognition(ASR) methods which relied on annotated datasets. Recently, the Amazon Alexa team published a paper proposing a self-learning method to allow Alexa correct mistakes while interacting with users. Read More
Hierarchical Imitation and Reinforcement Learning
We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction. Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration. Using long-horizon benchmarks, including Montezuma’s Revenge, we demonstrate that our approach can learn significantly faster than hierarchical RL, and be significantly more label-efficient than standard IL. We also theoretically analyze labeling cost for certain instantiations of our framework. Read More
RL — Imitation Learning
Imitation is a key part in the human learning. In the high-tech world, if you are not an innovator, you want to be a quick follower. In reinforcement learning, we maximize the rewards for our actions. Model-based RL focuses on the model (the system dynamics) to optimize our decisions while Policy Gradient methods improve the policy for better rewards.
On the other hand, Imitation learning focuses on imitating expert demonstrations. Read More
Observational Learning by Reinforcement Learning
Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a ’teacher’ (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent’s behaviour. The later is naturally modeled by RL, by correlating the learning agent’s reward with the teacher agent’s behaviour. Read More
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, object positions and types, and other factors. We term this kind of imitation learning “imitation-from-observation,” and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts the assumption in imitation learning that the demonstration should consist of observations in the same environment configuration, and enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Our experimental results show the effectiveness of our approach in learning a wide range of real-world robotic tasks modeled after common household chores from videos of a human demonstrator, including sweeping, ladling almonds, pushing objects as well as a number of tasks in simulation. Read More
Learning with Hierarchical-Deep Models
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets. Read More
Hierarchical Compositional Feature Learning
We introduce the hierarchical compositional network (HCN), a directed generative model able to discover and disentangle, without supervision, the building blocks of a set of binary images. The building blocks are binary features defined hierarchically as a composition of some of the features in the layer immediately below, arranged in a particular manner. At a high level, HCN is similar to a sigmoid belief network with pooling. Inference and learning in HCN are very challenging and existing variational approximations do not work satisfactorily. A main contribution of this work is to show that both can be addressed using max-product message passing (MPMP) with a particular schedule (no EM required). Also, using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest. When used for classification, fast inference with HCN has exactly the same functional form as a convolutional neural network (CNN) with linear activations and binary weights. However, HCN’s features are qualitatively very different . Read More