Air Force bases look to facial recognition to secure entry

Two Air Force installations recently inked deals to use facial recognition technology to verify the identities of those coming on base — a move that can increase the physical distance during security checks as the coronavirus pandemic continues.

The Air Force awarded TrueFace phase two Small Business Innovation Research contracts to install its technology at Eglin Air Force Base and Joint Base McGuire-Dix-Lakehurst. The company calls its system “frictionless access control,” where security personnel do not need to be present for a check, adding that it can verify a face in one to two seconds. Read More

#dod, #image-recognition

Gait-based Emotion Learning

Read More
#robotics, #videos, #image-recognition

Deep Learning with CIFAR-10

Image Classification using CNN

Neural Networks are the programmable patterns that helps to solve complex problems and bring the best achievable output. Deep Learning as we all know is a step ahead of Machine Learning, and it helps to train the Neural Networks for getting the solution of questions unanswered and or improving the solution!

In this article, we will be implementing a Deep Learning Model using CIFAR-10 dataset. Read More

#image-recognition, #python

Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches

Recent advances in machine learning and computer graphics have made it easier to convincingly manipulate video and audio. These so-called deep-fake videos range from complete full-face synthesis and replacement (face-swap), to complete mouth and audio synthesis and replacement (lip-sync), and partial word-based audio and mouth synthesis and replacement. Detection of deep fakes with only a small spatial and temporal manipulation is particularly challenging. We describe a technique to detect such manipulated videos by exploiting the fact that the dynamics of the mouth shape – visemes – are occasionally inconsistent with a spoken phoneme. We focus on the visemes associated with words having the sound M(mama), B(baba), or P(papa) in which the mouth must completely close in order to pronounce these phonemes. We observe that this is not the case in many deep-fake videos. Such phoneme-viseme mismatches can, therefore, be used to detect even spatially small and temporally localized manipulations. We demonstrate the efficacy and robustness of this approach to detect different types of deep-fake videos, including in-the-wild deep fakes. Read More

#fake, #image-recognition

Lidar used to cost $75,000—here’s how Apple brought it to the iPhone

How Apple made affordable lidar with no moving parts for the iPhone.

At Tuesday’s unveiling of the iPhone 12, Apple touted the capabilities of its new lidar sensor. Apple says lidar will enhance the iPhone’s camera by allowing more rapid focus, especially in low-light situations. And it may enable the creation of a new generation of sophisticated augmented reality apps. Read More

#big7, #image-recognition, #robotics

VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps). In this challenge, no additional image-caption training data, other than COCO Captions, is allowed for model training. Thus, conventional Vision-Language Pre-training (VLP) methods cannot be applied. This paper presents VIsual VOcabulary pretraining (VIVO) that performs pre-training in the absence of caption annotations. By breaking the dependency of pairedimage-caption training data in VLP, VIVO can leverage large amounts of paired image-tag data to learn a visual vocabulary. This is done by pre-training a multi-layer Transformer model that learns to align image-level tags with their corresponding image region features. To address the unordered nature of image tags, VIVO uses a Hungarian matching loss with masked tag prediction to conduct pre-training.

We validate the effectiveness of VIVO by fine-tuning the pre-trained model for image captioning. In addition, we perform an analysis of the visual-text alignment inferred by our model. The results show that our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Our single model has achieved new state-of-the-art results on nocaps and surpassed the human CIDEr score. Read More

#image-recognition, #nlp, #big7

Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research

Read More

NVIDIA Maxine is a fully accelerated platform SDK for developers of video conferencing services to build and deploy AI-powered features that use state-of-the-art models in their cloud. Video conferencing applications based on Maxine can reduce video bandwidth usage down to one-tenth of H.264 using AI video compression, dramatically reducing costs. Read More

#nvidia, #videos, #image-recognition

Toonify Yourself!

Upload a photo and see what you’d look like in an animated movie!

Read More

#image-recognition

AI ‘resurrects’ 54 Roman emperors, in stunningly lifelike images

Ancient Roman emperors’ faces have been brought to life in digital reconstructions; the unnervingly realistic image project includes the Emperors Caligula, Nero and Hadrian, among others.  Read More

#image-recognition

Computational Needs for Computer Vision (CV) in AI and ML Systems

Computer vision (CV) is a major task for modern Artificial Intelligence (AI) and Machine Learning (ML) systems. It’s accelerating nearly every domain in the tech industry enabling organizations to revolutionize the way machines and business systems work.

… In this article, we briefly show you the common challenges associated with a CV system when it employs modern ML algorithms. Read More

#image-recognition, #vision