I Dream My Painting and I Paint My Dream

Dutch photographer Bas Uterwijk used artificial intelligence to create a realistic portrait of Vincent van Gogh on van Gogh’s 168th birthday.

#gans, #image-recognition

The neural network has completed the faces of people from the past. The surviving portraits.

Read Mpre

#image-recognition, #videos

Big Self-Supervised Models Advance Medical Image Classification

Self-supervised pretraining followed by supervised fine-tuning has seen success in image recognition, especially when labeled examples are scarce, but has received limited attention in medical image analysis. This paper studies the effectiveness of self-supervised learning as a pretraining strategy for medical image classification. We conduct experiments on two distinct tasks: dermatology skin condition classification from digital camera images and multi-label chest X-ray classification, and demonstrate that self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images significantly improves the accuracy of medical image classifiers. We introduce a novel Multi-Instance Contrastive Learning (MICLe) method that uses multiple images of the underlying pathology per patient case, when available, to construct more informative positive pairs for self-supervised learning. Combining our contributions, we achieve an improvement of 6.7% in top-1 accuracy and an improvement of 1.1% in mean AUC on dermatology and chest X-ray classification respectively, outperforming strong supervised baselines pretrained on ImageNet. In addition, we show that big self-supervised models are robust to distribution shift and can learn efficiently with a small number of labeled medical images. Read More

#image-recognition, #self-supervised

Bottleneck Transformers for Visual Recognition

We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attentionin the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt [72] evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy on the ImageNet benchmark while being up to 2.33x faster in “compute” time than the popular EfficientNet models on TPU-v3 hardware. We hope our simple and effective approach will serve as a strong baseline for future research in self-attention models for vision. Read More

#image-recognition

The hidden fingerprint inside your photos

They say a picture is worth a thousand words. Actually, there’s a great deal more hidden inside the modern digital image, says researcher Jerone Andrews.

… When you take a photo, your smartphone or digital camera stores “metadata” within the image file. This automatically and parasitically burrows itself into every photo you take. It is data about data, providing identifying information such as when and where an image was captured, and what type of camera was used.

…But metadata is not the only thing hidden in your photos. There is also a unique personal identifier linking every image you capture to the specific camera used. Read More

#fake, #image-recognition

YouTuber Creates Roadside AI-Powered Camera To Compliment Dogs That Pass By

Every dog is the best dog. It doesn’t matter whose dog they are, what breed, or how well-behaved they are, they are all good dogs. But do you ever wish that you could tell the dogs how amazing they are, constantly?

Fear not, as YouTuber Ryder Calm Down has the answer. Using a megaphone, a camera, and a smart integrated machine-learning system, the nifty technology and comedy commentator created a device that recognizes dogs as they walk down the street and shouts compliments to them. After all, they deserve it.  Read More

#image-recognition

Text-to-Image Generation Grounded by Fine-Grained User Attention

Localized Narratives [29] is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TRECS, a sequential model that exploits this grounding to generate images. TRECS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used to select and position masks to generate a fully covered segmentation canvas; the final image is produced by a segmentation-to-image generator using this canvas. This multi-step, retrieval-based approach outperforms existing direct text-to-image generation models on both automatic metrics and human evaluations: overall, its generated images are more photo-realistic and better match descriptions. Read More

#image-recognition, #nlp

Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity

Reconstructing complex and dynamic visual perception from brain activity remains a major challenge in machine learning applications to neuroscience. Here we present a new method for reconstructing naturalistic images and videos from very large single-participant functional magnetic resonance data that leverages the recent success of image-to-image transformation networks. This is achieved by exploiting spatial information obtained from retinotopic mappings across the visual system. More specifically, we first determine what position each voxel in a particular region of interest would represent in the visual field based on its corresponding receptive field location. Then, the 2D image representation of the brain activity on the visual field is passed to a fully convolutional image-to-image network trained to recover the original stimuli using VGG feature loss with an adversarial regularizer. In our experiments, we show that our method offers a significant improvement over existing video reconstruction techniques. Read More

#human, #image-recognition

AI backpack concept gives audio alerts to blind pedestrians

When Jagadish Mahendran heard about his friend’s daily challenges navigating as a blind person, he immediately thought of his artificial intelligence work.

“For years I had been teaching robots to see things,” he said. Mahendran, a computer vision researcher at the University of Georgia’s Institute for Artificial Intelligence, found it ironic that he had helped develop machines — including a shopping robot that could “see” stocked shelves and a kitchen robot — but nothing for people with low or no vision. 

After exploring existing tech for blind and low vision people like camera-enabled canes or GPS-connected smartphone apps, he came up with a backpack-based AI design that uses cameras to provide instantaneous alerts.  Read More

#image-recognition, #vision

Adobe Photoshop uses AI to quadruple your photo’s size

Super resolution blows up a 12-megapixel smartphone photo into a much larger 48-megapixel shot. It’s coming to Lightroom soon, too.

#image-recognition