Segmentation — identifying which image pixels belong to an object — is a core task in computer vision and is used in a broad array of applications, from analyzing scientific imagery to editing photos. But creating an accurate segmentation model for specific tasks typically requires highly specialized work by technical experts with access to AI training infrastructure and large volumes of carefully annotated in-domain data.
Today, we aim to democratize segmentation by introducing the Segment Anything project: a new task, dataset, and model for image segmentation, as we explain in our research paper. We are releasing both our general Segment Anything Model (SAM) and our Segment Anything 1-Billion mask dataset (SA-1B), the largest ever segmentation dataset, to enable a broad set of applications and foster further research into foundation models for computer vision. We are making the SA-1B dataset available for research purposes and the Segment Anything Model is available under a permissive open license (Apache 2.0). Check out the demo to try SAM with your own images. Read More
Tag Archives: Vision
The Future Direction And Vision For AI
This article sets out the journey of Artificial Intelligence (AI) and the interrelationship with the arrival of the “era of Big Data” alongside 3G and 4G telecoms networks. This will discuss or explore how we arrived at where we are now and also where we are going to next with the era of even bigger albeit increasingly decentralised data in the era of AI meets the IoT (AIoT) and standalone 5G networks that may arrive in the next few years. Read More
New technology gives smart cars ‘x-ray’-like vision
Detects hidden pedestrians, cyclists
Share Australian researchers have developed a technology that allows autonomous vehicles to track moving pedestrians hidden behind buildings and cyclists obscured by cars, trucks, and buses.
The autonomous vehicle uses game changing tools that allows it to ‘’see the world around it using x-ray style vision that penetrates through to pedestrian blind spots.
The technology has been developed as part of a project funded by the iMOVE Cooperative Research Centre in collaboration with the University of Sydney’s Australian Centre for Field Robotics and Australian connected vehicle company Cohda Wireless. iMove has today released its new findings in a final report following three years of research and development. Read More
AI backpack concept gives audio alerts to blind pedestrians
When Jagadish Mahendran heard about his friend’s daily challenges navigating as a blind person, he immediately thought of his artificial intelligence work.
“For years I had been teaching robots to see things,” he said. Mahendran, a computer vision researcher at the University of Georgia’s Institute for Artificial Intelligence, found it ironic that he had helped develop machines — including a shopping robot that could “see” stocked shelves and a kitchen robot — but nothing for people with low or no vision.
After exploring existing tech for blind and low vision people like camera-enabled canes or GPS-connected smartphone apps, he came up with a backpack-based AI design that uses cameras to provide instantaneous alerts. Read More
Neuroscientists find a way to make object-recognition models perform better
Computer vision models known as convolutional neural networks can be trained to recognize objects nearly as accurately as humans do. However, these models have one significant flaw: Very small changes to an image, which would be nearly imperceptible to a human viewer, can trick them into making egregious errors such as classifying a cat as a tree.
A team of neuroscientists from MIT, Harvard University, and IBM have developed a way to alleviate this vulnerability, by adding to these models a new layer that is designed to mimic the earliest stage of the brain’s visual processing system. In a new study, they showed that this layer greatly improved the models’ robustness against this type of mistake. Read More
Computer Vision software for image and video identification
Computer vision often detects and locates objects in digital images and videos. As living organisms process images with their visual cortex, many researchers have taken the architecture of the mammalian visual cortex as a model for neural networks structured to perform image recognition.
Over the past 20 years, progress in computer vision has been remarkable. Read More
Computational Needs for Computer Vision (CV) in AI and ML Systems
Computer vision (CV) is a major task for modern Artificial Intelligence (AI) and Machine Learning (ML) systems. It’s accelerating nearly every domain in the tech industry enabling organizations to revolutionize the way machines and business systems work.
… In this article, we briefly show you the common challenges associated with a CV system when it employs modern ML algorithms. Read More
Sign language recognition using deep learning
TL;DR It is presented a dual-cam first-vision translation system using convolutional neural networks. A prototype was developed to recognize 24 gestures. The vision system is composed of a head-mounted camera and a chest-mounted camera and the machine learning model is composed of two convolutional neural networks, one for each camera. Read More
Neuroevolution of Self-Interpretable Agents
Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks,allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail. Read More
#image-recognition, #reinforcement-learning, #visionFaster video recognition for the smartphone era
By one estimate, training a video-recognition model can take up to 50 times more data and eight times more processing power than training an image-classification model. That’s a problem as demand for processing power to train deep learning models continues to rise exponentially and concerns about AI’s massive carbon footprint grow. Running large video-recognition models on low-power mobile devices, where many AI applications are heading, also remains a challenge.
Song Han, an assistant professor at MIT’s Department of Electrical Engineering and Computer Science (EECS), is tackling the problem by designing more efficient deep learning models. Read More