Observational Learning by Reinforcement Learning

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a ’teacher’ (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent’s behaviour. The later is naturally modeled by RL, by correlating the learning agent’s reward with the teacher agent’s behaviour. Read More

#human, #observational-learning, #reinforcement-learning

Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, object positions and types, and other factors. We term this kind of imitation learning “imitation-from-observation,” and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts the assumption in imitation learning that the demonstration should consist of observations in the same environment configuration, and enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Our experimental results show the effectiveness of our approach in learning a wide range of real-world robotic tasks modeled after common household chores from videos of a human demonstrator, including sweeping, ladling almonds, pushing objects as well as a number of tasks in simulation. Read More

#human, #observational-learning, #reinforcement-learning

China’s AI Ambitions

Rhodes Scholar Jeff Ding, author of the ChinAI newsletter, breaks down how China stacks up to the rest of the world in the race to develop AI, first, in this October 2018 interview with Jordan Schneider and then in this May 2019 interview with James Wang. His ChinAI newsletter archives are available here — Read More

#china-ai, #podcasts

Natural Adversarial Examples

We introduce natural adversarial examples – real-world, unmodified, and naturally occurring examples that cause classifier accuracy to significantly degrade. We curate 7,500 natural adversarial examples and release them in an ImageNet classifier test set that we call IMAGENET-A. This dataset serves as a new way to measure classifier robustness. Like `p adversarial examples, IMAGENET-A examples successfully transfer to unseen or black-box classifiers. For example, on IMAGENET-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%. Recovering this accuracy is not simple because IMAGENET-A examples exploit deep flaws in current classifiers including their over-reliance on color, texture, and background cues. We observe that popular training techniques for improving robustness have little effect, but we show that some architectural changes can enhance robustness to natural adversarial examples. Future research is required to enable robust generalization to this hard ImageNet test set. Read More

#assurance

Hype And Reality In Chinese Artificial Intelligence

In MIT Technology Review, Jeff Ding shares five takeaways from his experience writing about and translating Chinese-language writing about artificial intelligence (AI) research in China. Ding is a researcher at the University of Oxford who has now published 48 issues of his insightful ChinAI newsletter.

For more discussion of U.S.-China technology connections, listen to this recent Sinica Podcast with Samm Sacks. You can also listen to a ChinaEconTalk interview with Jeff Ding here on SupChina. Read More

#china-ai

Much Ado About Data: How America and China Stack Up

Analysts often cite the amount of data in China as a core advantage of its artificial intelligence (AI) ecosystem compared to the United States. That’s true to a certain extent: 1.4 billion people + deep smartphone penetration + 24/7 online and offline data collection = staggering amount of data.

But the reality is far more complex, because data is not a single-dimensional input into AI, something that China simply has “more” of. The relationship between data and AI prowess is analogous to the relationship between labor and the economy. China may have an abundance of workers, but the quality, structure, and mobility of that labor force is just as important to economic development. Read More

#china-vs-us

A Deep Generative Model for Graph Layout

Different layouts can characterize different aspects of the same graph. Finding a “good” layout of a graph is thus animportant task for graph visualization. In practice, users often visualize a graph in multiple layouts by using different methods andvarying parameter settings until they find a layout that best suits the purpose of the visualization. However, this trial-and-error processis often haphazard and time-consuming. To provide users with an intuitive way to navigate the layout design space, we presenta technique to systematically visualize a graph in diverse layouts using deep generative models. We design an encoder-decoderarchitecture to learn a model from a collection of example layouts, where the encoder represents training examples in a latent spaceand the decoder produces layouts from the latent space. In particular, we train the model to construct a two-dimensional latent spacefor users to easily explore and generate various layouts. We demonstrate our approach through quantitative and qualitative evaluationsof the generated layouts. The results of our evaluations show that our model is capable of learning and generalizing abstract conceptsof graph layouts, not just memorizing the training examples. In summary, this paper presents a fundamentally new approach to graphvisualization where a machine learning model learns to visualize a graph from examples without manually-defined heuristics. Read More

#vfx

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

We describe Parrotron, an end-to-end-trained speech-to-speechconversion model that maps an input spectrogram directly toanother spectrogram, without utilizing any intermediate discreterepresentation. The network is composed of an encoder, spectro-gram and phoneme decoders, followed by a vocoder to synthe-size a time-domain waveform. We demonstrate that this modelcan be trained to normalize speech from any speaker regardlessof accent, prosody, and background noise, into the voice of asinglecanonical target speaker with a fixed accent and consistentarticulation and prosody. We further show that this normalizationmodel can be adapted to normalize highly atypical speech froma deaf speaker, resulting in significant improvements in intelli-gibility and naturalness, measured via a speech recognizer andlistening tests. Finally, demonstrating the utility of this modelon other speech tasks, we show that the same model architecturecan be trained to perform a speech separation task.Index Terms: speech normalization, voice conversion, atypicalspeech, speech synthesis, sequence-to-sequence mode. Read More

#nlp