Writing photo captions is a monotonous — but necessary — chore begrudgingly undertaken by editors everywhere. Fortunately for them, though, AI might soon be able to handle the bulk of the work. In a paper (“Adversarial Semantic Alignment for Improved Image Captions”) appearing at the 2019 Conference in Computer Vision and Pattern Recognition (CVPR) in Long Beach, California this week, a team of scientists at IBM Research describes a model capable of autonomously crafting diverse, creative, and convincingly humanlike captions. Read More
Monthly Archives: June 2019
Adversarial Semantic Alignment for Improved Image Captions
In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually cooccur together. To this end, we introduce a small captioned Out of Context (OOC) test set. The OOC set, combined with our semantic score, are the proposed new diagnosis tools for the captioning community. When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training. Read More
Automated Machine Learning (AML) Comes of Age – Almost
You know you’ve come of age when the major review publications like Gartner and Forrester publish a study on your segment. That’s what’s finally happened. Just released is “The Forrester New Wave™: Automation-Focused Machine Learning Solutions, Q2 2019”.
This is the first reasonably deep review of platforms and covers nine of what Forrester describes as ‘the most significant providers in the segment’. Those being Aible, Bell Integrator, Big Squid, DataRobot, DMway Analytics, dotData, EdgeVerve, H2O.ai, and Squark.
I’ve been following these automated machine learning (AML) platforms since they emerged. I wrote first about them in the spring of 2016 under the somewhat scary title “Data Scientists Automated and Unemployed by 2025!”.
Well we’ve still got six years to run and it hasn’t happened yet. On the other hand no-code data science is on the rise and AML platforms along with their partially automated platform brethren are what’s behind it. Read More
World Models
We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. Read More
This AI Learns From Its Dreams
Linking Artificial Intelligence Principles
Various Artificial Intelligence Principles are designed with different considerations, and none of them can be perfect and complete for every scenario. Linking Artificial Intelligence Principles (LAIP)is an initiative and platform for synthesizing, linking, and analyzing various Artificial Intelligence Principles World Wide, from different research institutes, non-profit organizations, non-governmental organizations, companies, etc. The efforts aim at understanding in which degree do these different AI Principles proposals share common values, differ and complete each other. Read More
U.S. and China Go Their Own Ways With AI
As the U.S. and China appear headed for a digital cold war, competing policy approaches to the same technologies are emerging. Artificial intelligence is a prime example: Policy makers in democratic societies should, in theory, be making sure it isn’t used to promote intellectual conformity or to persecute minorities and dissidents.
The idea that AI should be ethical and benefit society has led to the emergence of multiple versions of basic principles, drafted by governments, academics and industry groups. Last year, Chinese researchers Yi Zeng, Enmeng Lu and Cunqing Huangfu identified 27 such codes and made a website on which they can be compared. It makes a somewhat eerie impression, as if the various codes form a data set on which an AI algorithm could be trained to spew forth ethical principles for its peers. Read More
Rasputin performing Halo
Off-Policy Classification – A New Reinforcement Learning Model Selection Method
Reinforcement learning (RL) is a framework that lets agents learn decision making from experience. One of the many variants of RL is off-policy RL, where an agent is trained using a combination of data collected by other agents (off-policy data) and data it collects itself to learn generalizable skills like robotic walking and grasping. In contrast, fully off-policy RL is a variant in which an agent learns entirely from older data, which is appealing because it enables model iteration without requiring a physical robot. With fully off-policy RL, one can train several models on the same fixed dataset collected by previous agents, then select the best one. However, fully off-policy RL comes with a catch: while training can occur without a real robot, evaluation of the models cannot. Furthermore, ground-truth evaluation with a physical robot is too inefficient to test promising approaches that require evaluating a large number of models, such as automated architecture search with AutoML. Read More
Off-Policy Evaluation via Off-Policy Classification
In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments. Typically, the performance of deep RL algorithms is evaluated via on-policy interactions with the target environment. However, comparing models in a real-world environment for the purposes of early stopping or hyperparameter tuning is costly and often practically infeasible. This leads us to examine off-policy policy evaluation (OPE) in such settings. We focus on OPE for value-based methods, which are of particular interest in deep RL, with applications like robotics, where off-policy algorithms based on Q-function estimation can often attain better sample complexity than direct policy optimization. Existing OPE metrics either rely on a model of the environment, or the use of importance sampling (IS) to correct for the data being off-policy. However, for high-dimensional observations, such as images, models of the environment can be difficult to fit and value-based methods can make IS hard to use or even illconditioned, especially when dealing with continuous action spaces. In this paper, we focus on the specific case of MDPs with continuous action spaces and sparse binary rewards, which is representative of many important real-world applications. We propose an alternative metric that relies on neither models nor IS, by framing OPE as a positive-unlabeled (PU) classification problem with the Q-function as the decision function. We experimentally show that this metric outperforms baselines on a number of tasks. Most importantly, it can reliably predict the relative performance of different policies in a number of generalization scenarios, including the transfer to the real-world of policies trained in simulation for an image-based robotic manipulation task. Read More