New AI Dupes Humans into Believing Synthesized Sound Effects Are Real

Using machine-learning, AutoFoley determines what actions are taking place in a video clip and creates realistic sound effects.

… Researchers have created an automated program that analyzes the movement in video frames and creates its own artificial sound effects to match the scene. In a survey, the majority of people polled indicated that they believed the fake sound effects were real. The model, AutoFoley, is described in a study published June 25 in IEEE Transactions on Multimedia. Read More

#fake, #image-recognition

AI Magic Makes Century-Old Films Look New

Denis Shiryaev uses algorithms to colorize and sharpen old movies, bumping them up to a smooth 60 frames per second. The result is a stunning glimpse at the past.

On April 14, 1906, the Miles brothers left their studio on San Francisco’s Market Street, boarded a cable car, and began filming what would become an iconic short movie. Called A Trip Down Market Street, it’s a fascinating documentation of life at the time.

… Well over a century later, an artificial intelligence geek named Denis Shiryaev has transformed A Trip Down Market Street into something even more magical. Read More

#image-recognition, #vfx

The hack that could make face recognition think someone else is you

Researchers have demonstrated that they can fool a modern face recognition system into seeing someone who isn’t there.

A team from the cybersecurity firm McAfee set up the attack against a facial recognition system similar to those currently used at airports for passport verification. By using machine learning, they created an image that looked like one person to the human eye, but was identified as somebody else by the face recognition algorithm—the equivalent of tricking the machine into allowing someone to board a flight despite being on a no-fly list. Read More

#cyber, #fake, #image-recognition

Generative Pretraining from Pixels

Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Trans-former to autoregressively predict pixels, without incorporating knowledge of the 2D input structure.Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning, matching the top supervised pretrained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet,achieving 72.0% top-1 accuracy on a linear probe of our features. Read More

#image-recognition

Sign language recognition using deep learning

TL;DR It is presented a dual-cam first-vision translation system using convolutional neural networks. A prototype was developed to recognize 24 gestures. The vision system is composed of a head-mounted camera and a chest-mounted camera and the machine learning model is composed of two convolutional neural networks, one for each camera. Read More

#image-recognition, #nlp, #vision

Adventures in PyTorch — Image classification with CalTech Birds 200 — Introduction

This series will explore the power of Facebook AI Research’s (FAIR) powerful neural network and machine learning architecture, PyTorch. In this series of articles, we will explore the power of PyTorch in application to an image classification problem, to identify 200 species of North American bird using the CalTech 200 birds dataset, by using various CNN architectures including GoogLeNet, ResNet152 and ResNeXt101, among others. Read More

#image-recognition, #python

An AI Learned To See Through Obstructions!

Read More

#image-recognition, #videos

Neuroevolution of Self-Interpretable Agents

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks,allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail. Read More

#image-recognition, #reinforcement-learning, #vision

NIST Launches Investigation of Face Masks’ Effect on Face Recognition Software

Algorithms created before the pandemic generally perform less accurately with digitally masked faces.

Now that so many of us are covering our faces to help reduce the spread of COVID-19, how well do face recognition algorithms identify people wearing masks? The answer, according to a preliminary study by the National Institute of Standards and Technology (NIST), is with great difficulty. Even the best of the 89 commercial facial recognition algorithms tested had error rates between 5% and 50% in matching digitally applied face masks with photos of the same person without a mask.

The results were published today as a NIST Interagency Report (NISTIR 8311), the first in a planned series from NIST’s Face Recognition Vendor Test (FRVT) program on the performance of face recognition algorithms on faces partially covered by protective masks.  Read More

#image-recognition

Learning to Cartoonize Using White-box Cartoon Representations

This paper presents an approach for image cartoonization. By observing the cartoon painting behavior and consulting artists, we propose to separately identify three white-box representations from images: the surface representation that contains a smooth surface of cartoon images, the structure representation that refers to the sparse color-blocks and flatten global content in the celluloid style workflow, and the texture representation that reflects high-frequency texture, contours, and details in cartoon images. A Generative Adversarial Network (GAN) framework is used to learn the extracted representations and to cartoonize images.

The learning objectives of our method are separately based on each extracted representations, making our framework controllable and adjustable. This enables our approach to meet artists’ requirements in different styles and diverse use cases. Qualitative comparisons and quantitative analyses, as well as user studies, have been conducted to validate the effectiveness of this approach, and our method outperforms previous methods in all comparisons. Finally, the ablation study demonstrates the influence of each component in our framework. Read More

#gans, #image-recognition