Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO

By supporting multi-modal retrieval training and evaluation, image captioning datasets have spurred remarkable progress on representation learning.Unfortunately, datasets have limited cross-modal associations: images are not paired with other images, captions are only paired with other captions of the same image, there are no negative associations and there are missing positive cross-modal associations. This undermines research into how inter-modality learning impacts intra-modality tasks. We address this gap with Crisscrossed Captions (CxC), an extension of the MS-COCO dataset with human semantic similarity judgments for267,095intra- and inter-modality pairs. We report baseline results on CxC for strong existing unimodal and multi-modal models. We also evaluate a multitask dual encoder trained on both image-caption and caption-caption pairs that crucially demonstrates CxC’s value for measuring the influence of intra- and inter-modality learning. Read More

#image-recognition, #nlp

A company is using artificial intelligence to insert new products and ads into content, including old movies

Mirriad uses AI technology to insert products and ads into new and old content, posing a threat to traditional advertising. The technology identifies places within content where ads or products could be inserted, as well as where the viewer’s attention drifts within any given still. It’s the in-content solution for the Chinese giant Tencent and plans to work with streaming platforms, where it could provide a solution for advertisers to make revenue amid a widespread shift to streaming. Read More

#image-recognition

First ship controlled by artificial intelligence prepares for maiden voyage

The “Mayflower 400”, the world’s first intelligent ship, bobs gently in a light swell as it stops its engines in Plymouth Sound, off England’s southwest coast, before self-activating a hydrophone designed to listen to whales.

The 50-foot (15-metre) trimaran, which weighs nine tonnes and navigates with complete autonomy, is preparing for a transatlantic voyage. Read More

#image-recognition

Fingerspelling

Online game to learn sign language. Fingerspelling.xyz combines advanced hand recognition technology with machine learning to teach sign language. Read More

#image-recognition, #nlp

A.I. reveals the hidden author of a crucial Bible text

Rescued from the dusty interior of the Qumran Caves in 1947, the Dead Sea Scrolls contain the oldest manuscripts of the Old Testament and are a crucial piece of Biblical history that dates back to the 4th century BCE.

But despite these scrolls’ status as an unmovable piece of religious history, there are still many things that scholars don’t really know about their origin. For example, who actually wrote them down?More like thisInnovation4.16.2021 9:00 AMNASA’s InSight crisis reveals the most difficult part of exploring MarsBy Dave GershgornInnovation4.18.2021 8:00 AMRobotic lawnmowers could cut a huge swath in air pollutionBy Sarah WellsInnovation4.11.2021 8:00 AMCreepy robot skin answers 3 questions about the futureBy Sarah WellsEARN REWARDS & LEARN SOMETHING NEW EVERY DAY.

Using artificial intelligence and pattern recognition, a team of paleographers (scientists who study ancient handwriting) and computer scientists from the University of Groningen have now discovered hidden details in these scrolls that point toward not just one scribe, but two original scribes.

The research was published Wednesday in the journal PLOS One. Read More

#nlp

#image-recognition

Democratising deep learning for microscopy with ZeroCostDL4Mic

Deep Learning (DL) methods are powerful analytical tools for microscopy and can outperform conventional image processing pipelines. Despite the enthusiasm and innovations fueled by DL technology, the need to access powerful and compatible resources to train DL networks leads to an accessibility barrier that novice users often find difficult to overcome.Here, we present ZeroCostDL4Mic, an entry-level platform simplifying DL access by lever-aging the free, cloud-based computational resources of Google Colab. ZeroCostDL4Micallows researchers with no coding expertise to train and apply key DL networks to perform tasks including segmentation (using U-Net and StarDist), object detection (using YOLOv2), denoising (using CARE and Noise2Void), super-resolution microscopy (using Deep-STORM),and image-to-image translation (using Label-free prediction – fnet, pix2pix and CycleGAN). Importantly, we provide suitable quantitative tools for each network to evaluate model performance, allowing model optimisation. We demonstrate the application of the platform to study multiple biological processes. Read More

#image-recognition

Geoffrey Hinton has a hunch about what’s next for AI

A decade ago, the artificial-intelligence pioneer transformed the field with a major breakthrough. Now he’s working on a new imaginary system named GLOM.

Back in November, the computer scientist and cognitive psychologist Geoffrey Hinton had a hunch. After a half-century’s worth of attempts—some wildly successful—he’d arrived at another promising insight into how the brain works and how to replicate its circuitry in a computer.

“It’s my current best bet about how things fit together,” Hinton says from his home office in Toronto, where he’s been sequestered during the pandemic. If his bet pays off, it might spark the next generation of artificial neural networks—mathematical computing systems, loosely inspired by the brain’s neurons and synapses, that are at the core of today’s artificial intelligence. His “honest motivation,” as he puts it, is curiosity. But the practical motivation—and, ideally, the consequence—is more reliable and more trustworthy AI.

A Google engineering fellow and cofounder of the Vector Institute for Artificial Intelligence, Hinton wrote up his hunch in fits and starts, and at the end of February announced via Twitter that he’d posted a 44-page paper on the arXiv preprint server. He began with a disclaimer: “This paper does not describe a working system,” he wrote. Rather, it presents an “imaginary system.” He named it, “GLOM.” The term derives from “agglomerate” and the expression “glom together.” Read More

#human, #image-recognition

Self-Supervised Equivariant Scene Synthesis from Video

We propose a self-supervised framework to learn scene representations from video that are automatically delineated into background, characters, and their animations. Our method capitalizes on moving characters being equivariant with respect to their transformation across frames and the background being constant with respect to that same transformation. After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components. As far as we know, we are the first method to perform unsupervised extraction and synthesis of interpretable background, character, and animation. We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling. Read More

#image-recognition

A Study of Face Obfuscation in ImageNet

Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however, many incidental people appear in the images, and their privacy is a concern. We first annotate faces in the dataset. Then we demonstrate that face blurring—a typical obfuscation technique—has minimal impact on the accuracy of recognition models. Concretely, we benchmark multiple deep neural networks on face-blurred images and observe that the overall recognition accuracy drops only slightly (≤0.68%). Further,we experiment with transfer learning to 4 downstream tasks (object recognition, scene recognition, face attribute classification, and object detection) and show that features learned on face-blurred images are equally transferable. Our work demonstrates the feasibility of privacy-aware visual recognition, improves the highly-used ImageNet challenge benchmark,and suggests an important path for future visual datasets. Read More

#accuracy, #image-recognition

Can AI read your emotions? Try it for yourself

Emotion recognition AI is bunk.

Don’t get me wrong, AI that recognizes human sentiment and emotion can be very useful. For example, it can help identify when drivers are falling asleep behind the wheel. But what it cannot do, is discern how a human being is actually feeling by the expression on their face.

You don’t have to take my word for it, you can try it yourself here. Read More

#image-recognition