AI ‘emotion recognition’ can’t be trusted

As artificial intelligence is used to make more decisions about our lives, engineers have sought out ways to make it more emotionally intelligent. That means automating some of the emotional tasks that come naturally to humans — most notably, looking at a person’s face and knowing how they feel.

To achieve this, tech companies like Microsoft, IBM, and Amazon all sell what they call “emotion recognition” algorithms, which infer how people feel based on facial analysis. For example, if someone has a furrowed brow and pursed lips, it means they’re angry. If their eyes are wide, their eyebrows are raised, and their mouth is stretched, it means they’re afraid, and so on.

But the belief that we can easily infer how people feel based on how they look is controversial, and a significant new review of the research suggests there’s no firm scientific justification for it. Read More

#explainability, #image-recognition

Gallery Go: a fast, helpful way to organize your photos offline

Today, at Google for Nigeria we introduced Gallery Go: a photo gallery, designed to work offline, that uses machine learning to automatically organize and make your photos look their best. Gallery Go helps first time smartphone owners easily find, edit, and manage photos, without the need for access to high-speed internet or cloud backup.

Gallery Go automatically organizes your photos by the people and things you take photos of, so you can easily find your favorite selfie, remember where you had the best puff puff, and keep track of important documents. You don’t have to manually label your photos and all these features run on your phone, without using your data. You can create folders to organize your photos, and Gallery Go works with SD cards, so you can easily copy them from your phone. Read More

#image-recognition

This AI magically removes moving objects from videos

We’ve previously seen developers harness the power of artificial intelligence (AI) to turn pitch black pics into bright colorful photos, flat images into complex 3D scenes, and selfies into moving avatars. Now, there’s an AI-powered software that effortlessly removes moving objects from videos.

All you need to do to wipe an object from footage is draw a box around it, and the software takes care of the rest for you. Read More

#fake, #image-recognition

Deep Flow-Guided Video Inpainting (CVPR 2019)

Read More

#image-recognition, #videos

You Can’t Fix What You Can’t See: The Realities of AI and Satellite Data

Earth observation (EO), the monitoring of the Earth from space using satellites, has undergone fundamental changes in the last decade. We have seen the convergence of two exciting trends in remote sensing and processing algorithms that now herald a new era of space renaissance.

The implementation of ambitious government initiatives such as the European Union’s Copernicus Programme, and an explosion in commercial satellite sensing constellations like Planet’s, has been matched by incredible breakthroughs in algorithm performance. This is due to advancements in accelerated computing, open source software, and broadly accessible training data. Read More

#image-recognition

Computer vision harvesting. 4 algorithms simultaneously identifying: – License plate number recognition – Brand and model type recognition – Logo detection – Car color recognition.

Read More

#image-recognition, #videos

Can AI Save the Internet from Fake News?

There’s an old proverb that says “seeing is believing.” But in the age of artificial intelligence, it’s becoming increasingly difficult to take anything at face value—literally.

The rise of so-called “deepfakes,” in which different types of AI-based techniques are used to manipulate video content, has reached the point where Congress held its first hearing last month on the potential abuses of the technology. The congressional investigation coincided with the release of a doctored video of Facebook CEO Mark Zuckerberg delivering what appeared to be a sinister speech. Read More

#fake, #image-recognition, #nlp

IBM’s AI automatically generates creative captions for images

Writing photo captions is a monotonous — but necessary — chore begrudgingly undertaken by editors everywhere. Fortunately for them, though, AI might soon be able to handle the bulk of the work. In a paper (“Adversarial Semantic Alignment for Improved Image Captions”) appearing at the 2019 Conference in Computer Vision and Pattern Recognition (CVPR) in Long Beach, California this week, a team of scientists at IBM Research describes a model capable of autonomously crafting diverse, creative, and convincingly humanlike captions. Read More

#gans, #image-recognition, #nlp

Adversarial Semantic Alignment for Improved Image Captions

In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually cooccur together. To this end, we introduce a small captioned Out of Context (OOC) test set. The OOC set, combined with our semantic score, are the proposed new diagnosis tools for the captioning community. When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training. Read More

#gans, #image-recognition, #nlp

Connecting Touch and Vision via Cross-Modal Prediction

Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model. Read More

#gans, #image-recognition