Daily Archives: March 25, 2021
Text-to-Image Generation Grounded by Fine-Grained User Attention
Localized Narratives [29] is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TRECS, a sequential model that exploits this grounding to generate images. TRECS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used to select and position masks to generate a fully covered segmentation canvas; the final image is produced by a segmentation-to-image generator using this canvas. This multi-step, retrieval-based approach outperforms existing direct text-to-image generation models on both automatic metrics and human evaluations: overall, its generated images are more photo-realistic and better match descriptions. Read More
Inside ‘TALON,’ the Nationwide Network of AI-Enabled Surveillance Cameras
Hundreds of pages of emails obtained by Motherboard show how little-known company Flock has expanded from surveilling individual neighborhoods into a network of smart cameras that spans the United States.
“Give your neighborhood peace of mind,” an advertisement for Flock, a line of smart surveillance cameras, reads. A February promotional video claims that the company’s “mission is to eliminate nonviolent crime across the country. We can only do that by working with every neighborhood and every police department throughout the country.”
Quietly, this seems to be happening. Read More
Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity
Reconstructing complex and dynamic visual perception from brain activity remains a major challenge in machine learning applications to neuroscience. Here we present a new method for reconstructing naturalistic images and videos from very large single-participant functional magnetic resonance data that leverages the recent success of image-to-image transformation networks. This is achieved by exploiting spatial information obtained from retinotopic mappings across the visual system. More specifically, we first determine what position each voxel in a particular region of interest would represent in the visual field based on its corresponding receptive field location. Then, the 2D image representation of the brain activity on the visual field is passed to a fully convolutional image-to-image network trained to recover the original stimuli using VGG feature loss with an adversarial regularizer. In our experiments, we show that our method offers a significant improvement over existing video reconstruction techniques. Read More
#human, #image-recognition