Turning any CNN image classifier into an object detector with Keras, TensorFlow, and OpenCV

In this tutorial, you will learn how to take any pre-trained deep learning image classifier and turn it into an object detector using Keras, TensorFlow, and OpenCV.

Today, we’re starting a four-part series on deep learning and object detection:

  • Part 1: Turning any deep learning image classifier into an object detector with Keras and TensorFlow (today’s post)
  • Part 2: OpenCV Selective Search for Object Detection
  • Part 3: Region proposal for object detection with OpenCV, Keras, and TensorFlow
  • Part 4: R-CNN object detection with Keras and TensorFlow

Read More

#image-recognition, #python

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences. Read More

#human, #image-recognition

Has Media & Entertainment Cracked the AI Code?

Artificial Intelligence (AI) and Machine Learning (ML) are technologies that enterprises across industries have been keenly experimenting with to explore the utility they can bring. Is there AI adoption within the M&E industry? Can AI be the solution for enterprises seeking automation? Have we cracked the AI code or do we have miles to go? If automation is a goal, it should be a priority even now.

Content recommendation (for OTT), speech-to-text and media recognition are some of the initial applications that have been attempted. Clients find vendor demos to be impressive, but when they do a proof of concept (PoC) with their content, results are not. In video operations, frame accuracy is a necessity and AI models struggle to universally solve for this. And such specific nuances of getting it right, is what makes automation work. After trying multiple vendors, clients conclude that AI data is still not accurate enough to solve specific M&E use cases. However, they remain optimistic about the future possibilities.

So where is the issue? Read More

#image-recognition, #nlp, #vfx

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization∗

Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks.Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily form two conflicting requirements; accurate predictions require large context, but precise predictions require high resolution. Due to memory limitations in current hardware,previous approaches tend to take low resolution images asinput to cover large spatial context, and produce less precise(or low resolution) 3D estimates as a result. We address this limitation by formulating a multi-level architecture that is end-to-end trainable. A coarse level observes the whole image at lower resolution and focuses on holistic reasoning.This provides context to an fine level which estimates highly detailed geometry by observing higher-resolution images.We demonstrate that our approach significantly outperforms existing state-of-the-art techniques on single image human shape reconstruction by fully leveraging 1k-resolution input images. Read More

#human, #image-recognition

Local Motion Phases for Learning Multi-Contact Character Movements

Read More

#image-recognition, #videos

The two-year fight to stop Amazon from selling face recognition to the police

In the summer of 2018, nearly 70 civil rights and research organizations wrote a letter to Jeff Bezos demanding that Amazon stop providing face recognition technology to governments. As part of an increased focus on the role that tech companies were playing in enabling the US government’s tracking and deportation of immigrants, it called on Amazon to “stand up for civil rights and civil liberties.” “As advertised,” it said, “Rekognition is a powerful surveillance system readily available to violate rights and target communities of color.”

Along with the letter, the American Civil Liberties Union (ACLU) of Washington delivered over 150,000 petition signatures as well as another letter from the company’s own shareholders expressing similar demands. A few days later, Amazon’s employees echoed the concerns in an internal memo.

Despite the mounting pressure, Amazon continued with business as usual. Read More

#bias, #explainability, #image-recognition

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

The primary aim of single-image super-resolution is to construct a high-resolution (HR) image from a corresponding low-resolution (LR) input. In previous approaches,which have generally been supervised, the training objective typically measures a pixel-wise average distance be-tween the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance (detailed) regions. We propose an alternative formulation of the super-resolution problem based on creating realistic SR images that downscale correctly. We present a novel super-resolution algorithm addressing this problem, PULSE (Photo Up sampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature. It accomplishes this in an entirely self-supervised fashion and is not confined to a specific degradation operator used during training, unlike previous methods (which require training on databases of LR-HR image pairs for supervised learning). Instead of starting with the LR image and slowly adding detail, PULSE traverses the high-resolution natural image manifold, searching for images that downscale to the original LR image. This is formalized through the “down-scaling loss,” which guides exploration through the latent space of a generative model. By leveraging properties of high-dimensional Gaussians, we restrict the search space to guarantee that our outputs are realistic. PULSE thereby generates super-resolved images that both are realistic and downscale correctly. We show extensive experimental results demonstrating the efficacy of our approach in the do-main of face super-resolution (also known as face hallucination). Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible. Read More

#image-recognition, #self-supervised

Assessing the Big Five personality traits using real-life static facial images

There is ample evidence that morphological and social cues in a human face provide signals of human personality and behaviour. Previous studies have discovered associations between the features of artificial composite facial images and attributions of personality traits by human experts. We present new findings demonstrating the statistically significant prediction of a wider set of personality features (all the Big Five personality traits) for both men and women using real-life static facial images. Volunteer participants (N = 12,447) provided their face photographs (31,367 images) and completed a self-report measure of the Big Five traits. We trained a cascade of artificial neural networks (ANNs) on a large labelled dataset to predict self-reported Big Five scores. The highest correlations between observed and predicted personality scores were found for conscientiousness (0.360 for men and 0.335 for women) and the mean effect size was 0.243, exceeding the results obtained in prior studies using ‘selfies’. The findings strongly support the possibility of predicting multidimensional personality profiles from static facial images using ANNs trained on large labelled datasets. Future research could investigate the relative contribution of morphological features of the face and other characteristics of facial images to predicting personality. Read More

#image-recognition

Moscow uses facial recognition network to maintain quarantine

A vast and contentious network of facial recognition cameras keeping watch over Moscow is now playing a key role in the battle against the spread of the coronavirus in Russia.

The city rolled out the technology just before the epidemic reached Russia, ignoring protests and legal complaints over sophisticated state surveillance. Read More

#image-recognition, #russia

Faster video recognition for the smartphone era

By one estimate, training a video-recognition model can take up to 50 times more data and eight times more processing power than training an image-classification model. That’s a problem as demand for processing power to train deep learning models continues to rise exponentially and concerns about AI’s massive carbon footprint grow. Running large video-recognition models on low-power mobile devices, where many AI applications are heading, also remains a challenge.

Song Han, an assistant professor at MIT’s Department of Electrical Engineering and Computer Science (EECS), is tackling the problem by designing more efficient deep learning models. Read More

#image-recognition, #vision