AI backpack concept gives audio alerts to blind pedestrians

When Jagadish Mahendran heard about his friend’s daily challenges navigating as a blind person, he immediately thought of his artificial intelligence work.

“For years I had been teaching robots to see things,” he said. Mahendran, a computer vision researcher at the University of Georgia’s Institute for Artificial Intelligence, found it ironic that he had helped develop machines — including a shopping robot that could “see” stocked shelves and a kitchen robot — but nothing for people with low or no vision. 

After exploring existing tech for blind and low vision people like camera-enabled canes or GPS-connected smartphone apps, he came up with a backpack-based AI design that uses cameras to provide instantaneous alerts.  Read More

#image-recognition, #vision

Neuroscientists find a way to make object-recognition models perform better

Computer vision models known as convolutional neural networks can be trained to recognize objects nearly as accurately as humans do. However, these models have one significant flaw: Very small changes to an image, which would be nearly imperceptible to a human viewer, can trick them into making egregious errors such as classifying a cat as a tree.

A team of neuroscientists from MIT, Harvard University, and IBM have developed a way to alleviate this vulnerability, by adding to these models a new layer that is designed to mimic the earliest stage of the brain’s visual processing system. In a new study, they showed that this layer greatly improved the models’ robustness against this type of mistake. Read More

#image-recognition, #vision

Computer Vision software for image and video identification

Computer vision often detects and locates objects in digital images and videos. As living organisms process images with their visual cortex, many researchers have taken the architecture of the mammalian visual cortex as a model for neural networks structured to perform image recognition.

Over the past 20 years, progress in computer vision has been remarkable. Read More

#vision

Computational Needs for Computer Vision (CV) in AI and ML Systems

Computer vision (CV) is a major task for modern Artificial Intelligence (AI) and Machine Learning (ML) systems. It’s accelerating nearly every domain in the tech industry enabling organizations to revolutionize the way machines and business systems work.

… In this article, we briefly show you the common challenges associated with a CV system when it employs modern ML algorithms. Read More

#image-recognition, #vision

Sign language recognition using deep learning

TL;DR It is presented a dual-cam first-vision translation system using convolutional neural networks. A prototype was developed to recognize 24 gestures. The vision system is composed of a head-mounted camera and a chest-mounted camera and the machine learning model is composed of two convolutional neural networks, one for each camera. Read More

#image-recognition, #nlp, #vision

Neuroevolution of Self-Interpretable Agents

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks,allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail. Read More

#image-recognition, #reinforcement-learning, #vision

Faster video recognition for the smartphone era

By one estimate, training a video-recognition model can take up to 50 times more data and eight times more processing power than training an image-classification model. That’s a problem as demand for processing power to train deep learning models continues to rise exponentially and concerns about AI’s massive carbon footprint grow. Running large video-recognition models on low-power mobile devices, where many AI applications are heading, also remains a challenge.

Song Han, an assistant professor at MIT’s Department of Electrical Engineering and Computer Science (EECS), is tackling the problem by designing more efficient deep learning models. Read More

#image-recognition, #vision

Depth-Aware Video Frame Interpolation

Video frame interpolation aims to synthesize non-existent frames in-between the original frames. While significant advances have been made from the deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose to explicitly detect the occlusion by exploring the depth cue in frame interpolation. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features as the contextual information. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable to optimize all the components. We conduct extensive experiments to analyze the effect of the depth-aware flow projection layer and hierarchical contextual features. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets. Read More

#vision

Rhythm and Synchrony in a Cortical Network Model

We studied mechanisms for cortical gamma-band activity in the cerebral cortex and identified neurobiological factors that affect such activity. This was done by analyzing the behavior of a previously developed, data-driven, large-scale network model that simulated many visual functions of monkey V1 cortex (Chariker et al., 2016). Gamma activity was an emergent property of the model. The model’s gamma activity, like that of the real cortex, was (1) episodic, (2) variable in frequency and phase, and (3) graded in power with stimulus variables like orientation. The spike firing of the model’s neuronal population was only partially synchronous during multiple firing events (MFEs) that occurred at gamma rates. Detailed analysis of the model’s MFEs showed that gamma-band activity was multidimensional in its sources. Most spikes were evoked by excitatory inputs. A large fraction of these inputs came from recurrent excitation within the local circuit, but feedforward and feedback excitation also contributed, either through direct pulsing or by raising the overall baseline. Inhibition was responsible for ending MFEs, but disinhibition led directly to only a small minority of the synchronized spikes. As a potential explanation for the wide range of gamma characteristics observed in different parts of cortex, we found that the relative rise times of AMPA and GABA synaptic conductances have a strong effect on the degree of synchrony in gamma. Read More

#human, #vision

Orientation Selectivity from Very Sparse LGN Inputs in a Comprehensive Model of Macaque V1 Cortex

A new computational model of the primary visual cortex (V1) of the macaque monkey was constructed to reconcile the visual functions of V1 with anatomical data on its LGN input, the extreme sparseness of which presented serious challenges to theoretically sound explanations of cortical function. We demonstrate that, even with such sparse input, it is possible to produce robust orientation selectivity, as well as continuity in the orientation map. We went beyond that to find plausible dynamic regimes of our new model that emulate simultaneously experimental data for a wide range of V1 phenomena, beginning with orientation selectivity but also including diversity in neuronal responses, bimodal distributions of the modulation ratio (the simple/complex classification), and dynamic signatures, such as gamma-band oscillations. Intracortical interactions play a major role in all aspects of the visual functions of the model. Read More

#human, #vision