Fetching AI Data: Researchers Get Leg Up on Teaching Dogs New Tricks with NVIDIA Jetson

AI is going to the dogs. Literally.

Colorado State University researchers Jason Stock and Tom Cavey have published a paper on an AI system to recognize and reward dogs for responding to commands.

The graduate students in computer science trained image classification networks to determine whether a dog is sitting, standing or lying. If a dog responds to a command by adopting the correct posture, the machine dispenses it a treat. Read More

#image-recognition, #nvidia

AffectiveSpotlight: Facilitating the Communication of Affective Responses from Audience Members during Online Presentations

The ability to monitor audience reactions is critical when delivering presentations. However, current videoconferencing platforms offer limited solutions to support this. This work leverages recent advances in affect sensing to capture and facilitate communication of relevant audience signals. Using an exploratory survey (N=175), we assessed the most relevant audience responses such as confusion,engagement, and head-nods. We then implemented AffectiveSpotlight, a Microsoft Teams bot that analyzes facial responses and head gestures of audience members and dynamically spotlights the most expressive ones. In a within-subjects study with 14 groups (N=117),we observed that the system made presenters significantly more aware of their audience, speak for a longer period of time, and self-assess the quality of their talk more similarly to the audience members, compared to two control conditions (randomly-selected spotlight and default platform UI). We provide design recommendations for future affective interfaces for online presentations based on feedback from the study. Read More

#image-recognition, #surveillance

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Humans learn language by listening, speaking, writing, reading, and also, via interaction with the multimodal real world. Existing language pretraining frameworks show the effectiveness of text-only self-supervision while we explore the idea of a visually-supervised language model in this paper. We find that the main reason hindering this exploration is the large divergence in magnitude and distributions between the visually-grounded language datasets and pure-language corpora. Therefore, we develop a technique named “vokenization” that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images (which we call “vokens”).The “vokenizer” is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora. Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks such as GLUE, SQuAD, and SWAG. Read More

#image-recognition, #nlp

Fractals can help AI learn to see more clearly—or at least more fairly

Large datasets like ImageNet have supercharged the last 10 years of AI vision, but they are hard to produce and contain bias. Computer generated datasets provide an alternative.

Most image-recognition systems are trained using large databases that contain millions of photos of everyday objects, from snakes to shakes to shoes. With repeated exposure, AIs learn to tell one type of object from another. Now researchers in Japan have shown that AIs can start learning to recognize everyday objects by being trained on computer-generated fractals instead.

It’s a weird idea but it could be a big deal. Generating training data automatically is an exciting trend in machine learning. And using an endless supply of synthetic images rather than photos scraped from the internet avoids problems with existing hand-crafted data sets. Read More

#image-recognition

This is how we lost control of our faces

In 1964, mathematician and computer scientist Woodrow Bledsoe first attempted the task of matching suspects’ faces to mugshots. He measured out the distances between different facial features in printed photographs and fed them into a computer program. His rudimentary successes would set off decades of research into teaching machines to recognize human faces.

Now a new study shows just how much this enterprise has eroded our privacy. It hasn’t just fueled an increasingly powerful tool of surveillance. The latest generation of deep-learning-based facial recognition has completely disrupted our norms of consent. Read More

#image-recognition, #surveillance

Who gets credit for AI-generated art?

The recent sale of an AI-generated portrait for $432,000 at Christie’s art auction has raised questions about how credit and responsibility should be allocated to individuals involved, and how the anthropomorphic perception of the AI system contributed to the artwork’s success. Here, we identify natural heterogeneity in the extent to which different people perceive AI as anthropomorphic. We find that differences in the perception of AI anthropomorphicity are associated with different allocations of responsibility to the AI system, and credit to different stakeholders involved in art production. We then show that perceptions of AI anthropomorphicity can be manipulated by changing the language used to talk about AI –– as a tool vs agent –– with consequences for artists and AI practitioners. Our findings shed light on what is at stake when we anthropomorphize AI systems, and offers an empirical lens to reason about how to allocate credit and responsibility to human stakeholders. Read More

#image-recognition

Gun Detection AI is Being Trained With Homemade ‘Active Shooter’ Videos

Companies are using bizarre methods to create algorithms that automatically detect weapons. AI ethicists worry they will lead to more police violence.

In Huntsville, Alabama, there is a room with green walls and a green ceiling. Dangling down the center is a fishing line attached to a motor mounted to the ceiling, which moves a procession of guns tied to the translucent line.

The staff at Arcarithm bought each of the 10 best-selling firearm models in the U.S.: Rugers, Glocks, Sig Sauers. Pistols and long guns are dangled from the line. The motor rotates them around the room, helping a camera mounted to a mobile platform photograph them from multiple angles. “ Read More

#image-recognition

Learning Transferable Visual Models From Natural Language Supervision

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of super-vision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision.We demonstrate that the simple pretraining task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pretraining, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training.For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. Read More

#image-recognition, #nlp

Facial recognition technology can expose political orientation from naturalistic facial images

Ubiquitous facial recognition technology can expose individuals’ political orientation, as faces of liberals and conservatives consistently differ. A facial recognition algorithm was applied to naturalistic images of 1,085,795 individuals to predict their political orientation by comparing their similarity to faces of liberal and conservative others. Political orientation was correctly classified in 72% of liberal–conservative face pairs, remarkably better than chance (50%), human accuracy (55%), or one afforded by a 100-item personality questionnaire (66%). Accuracy was similar across countries (the U.S., Canada, and the UK), environments (Facebook and dating websites), and when comparing faces across samples. Accuracy remained high (69%) even when controlling for age, gender, and ethnicity. Given the widespread use of facial recognition, our findings have critical implications for the protection of privacy and civil liberties. Read More

#image-recognition

High-Quality Background Removal Without Green Screens

Human matting is an extremely interesting task where the goal is to find any human in a picture and remove the background from it. It is really hard to achieve due to the complexity of the task, having to find the person or people with the perfect contour. … The MODNet background removal technique can extract a person from a single input image, without the need for a green screen in real-time! Read More

#image-recognition, #vfx