A Mathematical Model Unlocks the Secrets of Vision

This is the great mystery of human vision: Vivid pictures of the world appear before our mind’s eye, yet the brain’s visual system receives very little information from the world itself. Much of what we “see” we conjure in our heads.

“A lot of the things you think you see you’re actually making up,” said Lai-Sang Young, a mathematician at New York University. “You don’t actually see them.” Read More

#human, #vision

New brain map could improve AI algorithms for machine vision

Despite years of research, the brain still contains broad areas of uncharted territory. A team of scientists, led by neuroscientists from Cold Spring Harbor Laboratory and University of Sydney, recently found new evidence revising the traditional view of the primate brain’s visual system organization using data from marmosets. This remapping of the brain could serve as a future reference for understanding how the highly complex visual system works, and potentially influence the design of artificial neural networks for machine vision. Read More

#human, #vision

Emotion schemas are embedded in the human visual system

Theorists have suggested that emotions are canonical responses to situations ancestrally linked to survival. If so, then emotions may be afforded by features of the sensory environment. However, few computational models describe how combinations of stimulus features evoke different emotions. Here, we develop a convolutional neural network that accurately decodes images into 11 distinct emotion categories. We validate the model using more than 25,000 images and movies and show that image content is sufficient to predict the category and valence of human emotion ratings. In two functional magnetic resonance imaging studies, we demonstrate that patterns of human visual cortex activity encode emotion category–related model output and can decode multiple categories of emotional experience. These results suggest that rich, category-specific visual features can be reliably mapped to distinct emotions, and they are coded in distributed representations within the human visual system. Read More

#human, #vision

Restoring Vision With Bionic Eyes: No Longer Science Fiction

Bionic vision might sound like science fiction, but Dr. Michael Beyeler is working on just that.

Originally from Switzerland, Dr. Beyeler is wrapping up his postdoctoral fellow at the University of Washington before moving to the University of California Santa Barbara this fall to head up the newly formed Bionic Vision Lab in the Departments of Computer Science and Psychological & Brain Sciences.

We spoke with him about this “deep fascination with the brain” and how he hopes his work will eventually be able to restore vision to the blind. Read More

#human, #vision

Fully automatic landing with optically supported navigation for small aircrafts

Read More

#videos, #vision

Challenges and opportunities with Computer Vision at the Edge

Read More

#videos, #vision

Neuromorphic Chips and the Future of Your Cell Phone

This article is particularly fun for me since it brings together two developments that I didn’t see coming together, real time computer vision (RTCV), and neuromorphic neural nets (aka spiking neural nets).

We’ve been following neuromorphic nets for a few years now (additional references at the end of this article) and viewed them as the next generation (3rdgeneration) of neural nets.  This was mostly in the context of the pursuit of Artificial General Intelligence (AGI) which is the holy grail (or terrifying terminator) of all we’ve been doing.

Where we got off track was in thinking that neuromorphic nets that are just in their infancy were only for AGI.  Turns out that they facilitate a lot of closer-in capabilities, and among them could be real time computer vision (RTCV).  Why that’s true turns out to have more to do with how neuromorphics are structured than what fancy things they may be able to do.  Here’s the story. Read More

#vision

Real Time Computer Vision is Likely to be the Next Killer App but We’re Going to Need New Chips

Real Time Computer Vision (RTCV) that requires processing video DNNs at the edge is likely to be the next killer app that powers a renewed love affair with our mobile devices.  The problem is that current GPUs won’t cut it and we have to wait once again for the hardware to catch up.

The entire history of machine learning and artificial intelligence (AI/ML) has been a story about the race between techniques and hardware.  There have been times when we had the techniques but the hardware couldn’t keep up.  Conversely there have been times when hardware has outstripped technique.  Candidly though, it’s been mostly about waiting for the hardware to catch up.

You may not have thought about it, but we’re in one of those wait-for-tech hardware valleys right now.  Sure there’s lots of cloud based compute and ever faster GPU chips to make CNN and RNN work.  But the barrier that we’re up against is latency, particularly in computer vision.

If you want to utilize computer vision on your cell phone or any other edge device (did you ever think of self-driving cars as edge devices) then the data has to make the full round trip from your local camera to the cloud compute and back again before anything can happen. Read More

#vision

Moving Camera, Moving People: A Deep Learning Approach to Depth Prediction

The human visual system has a remarkable ability to make sense of our 3D world from its 2D projection. Even in complex environments with multiple moving objects, people are able to maintain a feasible interpretation of the objects’ geometry and depth ordering. The field of computer vision has long studied how to achieve similar capabilities by computationally reconstructing a scene’s geometry from 2D image data, but robust reconstruction remains difficult in many cases.

A particularly challenging case occurs when both the camera and the objects in the scene are freely moving. This confuses traditional 3D reconstruction algorithms that are based on triangulation, which assumes that the same object can be observed from at least two different viewpoints, at the same time. Satisfying this assumption requires either a multi-camera array (like Google’s Jump), or a scene that remains stationary as the single camera moves through it. As a result, most existing methods either filter out moving objects (assigning them “zero” depth values), or ignore them (resulting in incorrect depth values). Read More

#vision

Visipedia

Visipedia, short for “Visual Encyclopedia,” is a network of people and machines that is designed to harvest  and organize visual information and make it accessible to anyone anywhere. Visipedia machines can learn from experts how to discover and classify animals, plants and objects in images. Communities of scientists and interested citizens may use Visipedia software to share, annotate and organize meaningful content in images. Recent experiments include software that can detect and classify trees from satellite and street-level images, and an app that can recognize North American birds. Visipedia is a joint project between Pietro Perona’s Vision Group at Caltech and Serge Belongie’s Vision Group at Cornell Tech. Read More

#universities, #vision