The hidden fingerprint inside your photos

They say a picture is worth a thousand words. Actually, there’s a great deal more hidden inside the modern digital image, says researcher Jerone Andrews.

… When you take a photo, your smartphone or digital camera stores “metadata” within the image file. This automatically and parasitically burrows itself into every photo you take. It is data about data, providing identifying information such as when and where an image was captured, and what type of camera was used.

…But metadata is not the only thing hidden in your photos. There is also a unique personal identifier linking every image you capture to the specific camera used. Read More

#fake, #image-recognition

YouTuber Creates Roadside AI-Powered Camera To Compliment Dogs That Pass By

Every dog is the best dog. It doesn’t matter whose dog they are, what breed, or how well-behaved they are, they are all good dogs. But do you ever wish that you could tell the dogs how amazing they are, constantly?

Fear not, as YouTuber Ryder Calm Down has the answer. Using a megaphone, a camera, and a smart integrated machine-learning system, the nifty technology and comedy commentator created a device that recognizes dogs as they walk down the street and shouts compliments to them. After all, they deserve it.  Read More

#image-recognition

Text-to-Image Generation Grounded by Fine-Grained User Attention

Localized Narratives [29] is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TRECS, a sequential model that exploits this grounding to generate images. TRECS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used to select and position masks to generate a fully covered segmentation canvas; the final image is produced by a segmentation-to-image generator using this canvas. This multi-step, retrieval-based approach outperforms existing direct text-to-image generation models on both automatic metrics and human evaluations: overall, its generated images are more photo-realistic and better match descriptions. Read More

#image-recognition, #nlp

Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity

Reconstructing complex and dynamic visual perception from brain activity remains a major challenge in machine learning applications to neuroscience. Here we present a new method for reconstructing naturalistic images and videos from very large single-participant functional magnetic resonance data that leverages the recent success of image-to-image transformation networks. This is achieved by exploiting spatial information obtained from retinotopic mappings across the visual system. More specifically, we first determine what position each voxel in a particular region of interest would represent in the visual field based on its corresponding receptive field location. Then, the 2D image representation of the brain activity on the visual field is passed to a fully convolutional image-to-image network trained to recover the original stimuli using VGG feature loss with an adversarial regularizer. In our experiments, we show that our method offers a significant improvement over existing video reconstruction techniques. Read More

#human, #image-recognition

AI backpack concept gives audio alerts to blind pedestrians

When Jagadish Mahendran heard about his friend’s daily challenges navigating as a blind person, he immediately thought of his artificial intelligence work.

“For years I had been teaching robots to see things,” he said. Mahendran, a computer vision researcher at the University of Georgia’s Institute for Artificial Intelligence, found it ironic that he had helped develop machines — including a shopping robot that could “see” stocked shelves and a kitchen robot — but nothing for people with low or no vision. 

After exploring existing tech for blind and low vision people like camera-enabled canes or GPS-connected smartphone apps, he came up with a backpack-based AI design that uses cameras to provide instantaneous alerts.  Read More

#image-recognition, #vision

Adobe Photoshop uses AI to quadruple your photo’s size

Super resolution blows up a 12-megapixel smartphone photo into a much larger 48-megapixel shot. It’s coming to Lightroom soon, too.

#image-recognition

Multi-modal Self-Supervision from Generalized Data Transformations

The recent success of self-supervised learning can be largely attributed to content-preserving transformations, which can be used to easily induce invariances. While transformations generate positive sample pairs in contrastive loss training, most recent work focuses on developing new objective formulations, and pays rela-tively little attention to the transformations themselves. In this paper, we introduce the framework of Generalized Data Transformations to (1) reduce several recent self-supervised learning objectives to a single formulation for ease of comparison,analysis, and extension, (2) allow a choice between being invariant or distinctive to data transformations, obtaining different supervisory signals, and (3) derive the conditions that combinations of transformations must obey in order to lead to well-posed learning objectives. This framework allows both invariance and distinctiveness to be injected into representations simultaneously, and lets us systematically explore novel contrastive objectives. We apply it to study multi-modal self-supervision for audio-visual representation learning from unlabelled videos,improving the state-of-the-art by a large margin, and even surpassing supervised pretraining. We demonstrate results on a variety of downstream video and audio classification and retrieval tasks, on datasets such as HMDB-51, UCF-101,DCASE2014, ESC-50 and VGG-Sound. In particular, we achieve new state-of-the-art accuracies of 72.8% on HMDB-51 and 95.2% on UCF-101. Read More

#image-recognition, #self-supervised

Facebook’s next big AI project is training its machines on users’ public videos

AI that can understand video could be put to a variety of uses

Teaching AI systems to understand what’s happening in videos as completely as a human can is one of the hardest challenges — and biggest potential breakthroughs — in the world of machine learning. Today, Facebook announced a new initiative that it hopes will give it an edge in this consequential work: training its AI on Facebook users’ public videos.

Access to training data is one of the biggest competitive advantages in AI, and by collecting this resource from millions and millions of their users, tech giants like Facebook, Google, and Amazon have been able to forge ahead in various areas. And while Facebook has already trained machine vision models on billions of images collected from Instagram, it hasn’t previously announced projects of similar ambition for video understanding. Read More

#image-recognition

Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans

This paper addresses the challenge of novel view synthe-sis for a human performer from a very sparse set of cameraviews. Some recent works have shown that learning implicitneural representations of 3D scenes achieves remarkableview synthesis quality given dense input views. However,the representation learning will be ill-posed if the views arehighly sparse. To solve this ill-posed problem, our key ideais to integrate observations over video frames. To this end,we propose Neural Body, a new human body representationwhich assumes that the learned neural representations atdifferent frames share the same set of latent codes anchoredto a deformable mesh, so that the observations acrossframes can be naturally integrated. The deformable meshalso provides geometric guidance for the network to learn3D representations more efficiently. Experiments on a newlycollected multi-view dataset show that our approach out-performs prior works by a large margin in terms of the viewsynthesis quality. We also demonstrate the capability of ourapproach to reconstruct a moving person from a monocularvideo on the People-Snapshot dataset. The code and datasetwill be available at https://zju3dv.github.io/neuralbody/. Read More

#image-recognition

Am I a Real or Fake Celebrity?

Recently, significant advancements have been made in face recognition technologies using Deep Neural Networks. As a result, companies such as Microsoft, Amazon, and Naver offer highly accurate commercial face recognition web services for diverse applications to meet the end-user needs. Naturally, however, such technologies are threatened persistently, as virtually any individual can quickly implement impersonation attacks. In particular, these attacks can be a significant threat for authentication and identification services, which heavily rely on their underlying face recognition technologies’ accuracy and robustness. Despite its gravity, the issue regarding deepfake abuse using commercial web APIs and their robustness has not yet been thoroughly investigated. This work provides a measurement study on the robustness of black-box commercial face recognition APIs against Deepfake Impersonation (DI) attacks using celebrity recognition APIs as an example case study. We use five deepfake datasets, two of which are created by us and planned to be released. More specifically, we measure attack performance based on two scenarios (targeted and non-targeted) and further analyze the differing system behaviors using fidelity, confidence, and similarity metrics. Accordingly, we demonstrate how vulnerable face recognition technologies from popular companies are to DI attack, achieving maximum success rates of 78.0% and 99.9% for targeted (i.e., precise match) and non-targeted (i.e., match with any celebrity) attacks, respectively. Moreover, we propose practical defense strategies to mitigate DI attacks, reducing the attack success rates to as low as 0% and 0.02% for targeted and non-targeted attacks, respectively. Read More

#fake, #image-recognition