Two Air Force installations recently inked deals to use facial recognition technology to verify the identities of those coming on base — a move that can increase the physical distance during security checks as the coronavirus pandemic continues.
The Air Force awarded TrueFace phase two Small Business Innovation Research contracts to install its technology at Eglin Air Force Base and Joint Base McGuire-Dix-Lakehurst. The company calls its system “frictionless access control,” where security personnel do not need to be present for a check, adding that it can verify a face in one to two seconds. Read More
Tag Archives: Image Recognition
Gait-based Emotion Learning
Deep Learning with CIFAR-10
Image Classification using CNN
Neural Networks are the programmable patterns that helps to solve complex problems and bring the best achievable output. Deep Learning as we all know is a step ahead of Machine Learning, and it helps to train the Neural Networks for getting the solution of questions unanswered and or improving the solution!
In this article, we will be implementing a Deep Learning Model using CIFAR-10 dataset. Read More
Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches
Recent advances in machine learning and computer graphics have made it easier to convincingly manipulate video and audio. These so-called deep-fake videos range from complete full-face synthesis and replacement (face-swap), to complete mouth and audio synthesis and replacement (lip-sync), and partial word-based audio and mouth synthesis and replacement. Detection of deep fakes with only a small spatial and temporal manipulation is particularly challenging. We describe a technique to detect such manipulated videos by exploiting the fact that the dynamics of the mouth shape – visemes – are occasionally inconsistent with a spoken phoneme. We focus on the visemes associated with words having the sound M(mama), B(baba), or P(papa) in which the mouth must completely close in order to pronounce these phonemes. We observe that this is not the case in many deep-fake videos. Such phoneme-viseme mismatches can, therefore, be used to detect even spatially small and temporally localized manipulations. We demonstrate the efficacy and robustness of this approach to detect different types of deep-fake videos, including in-the-wild deep fakes. Read More
Lidar used to cost $75,000—here’s how Apple brought it to the iPhone
How Apple made affordable lidar with no moving parts for the iPhone.
At Tuesday’s unveiling of the iPhone 12, Apple touted the capabilities of its new lidar sensor. Apple says lidar will enhance the iPhone’s camera by allowing more rapid focus, especially in low-light situations. And it may enable the creation of a new generation of sophisticated augmented reality apps. Read More
VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training
It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps). In this challenge, no additional image-caption training data, other than COCO Captions, is allowed for model training. Thus, conventional Vision-Language Pre-training (VLP) methods cannot be applied. This paper presents VIsual VOcabulary pretraining (VIVO) that performs pre-training in the absence of caption annotations. By breaking the dependency of pairedimage-caption training data in VLP, VIVO can leverage large amounts of paired image-tag data to learn a visual vocabulary. This is done by pre-training a multi-layer Transformer model that learns to align image-level tags with their corresponding image region features. To address the unordered nature of image tags, VIVO uses a Hungarian matching loss with masked tag prediction to conduct pre-training.
We validate the effectiveness of VIVO by fine-tuning the pre-trained model for image captioning. In addition, we perform an analysis of the visual-text alignment inferred by our model. The results show that our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Our single model has achieved new state-of-the-art results on nocaps and surpassed the human CIDEr score. Read More
Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research
NVIDIA Maxine is a fully accelerated platform SDK for developers of video conferencing services to build and deploy AI-powered features that use state-of-the-art models in their cloud. Video conferencing applications based on Maxine can reduce video bandwidth usage down to one-tenth of H.264 using AI video compression, dramatically reducing costs. Read More
#nvidia, #videos, #image-recognitionToonify Yourself!
Upload a photo and see what you’d look like in an animated movie!

AI ‘resurrects’ 54 Roman emperors, in stunningly lifelike images
Computational Needs for Computer Vision (CV) in AI and ML Systems
Computer vision (CV) is a major task for modern Artificial Intelligence (AI) and Machine Learning (ML) systems. It’s accelerating nearly every domain in the tech industry enabling organizations to revolutionize the way machines and business systems work.
… In this article, we briefly show you the common challenges associated with a CV system when it employs modern ML algorithms. Read More