MIT’s sensor-packed glove helps AI identify objects by touch

Researchers have spent years trying to teach robots how to grip different objects without crushing or dropping them. They could be one step closer, thanks to this low-cost, sensor-packed glove. In a paper published in Nature, a team of MIT scientists share how they used the glove to help AI recognize objects through touch alone. That information could help robots better manipulate objects, and it may aid in prosthetics design.

The “scalable tactile glove,” or STAG, is a simple knit glove packed with more than 550 tiny sensors. The researchers wore STAG while handling 26 different objects — including a soda can, scissors, tennis ball, spoon, pen and a mug. As they did, the sensors gathered pressure-signal data, which was interpreted by a neural network. The system predicted the objects’ identity on touch alone with up to 76 percent accuracy, and it was able to predict the weight of most objects within about 60 grams. Read More

#human, #robotics

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research. Read More

#reinforcement-learning

Human-level in first-person multiplayer games with population-based deep RL

Read More

#reinforcement-learning, #videos

How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits

We significantly reduce the cost of factoring integers and computing discrete logarithms over finite fields on a quantum computer by combining techniques from Griffiths-Niu 1996, Zalka 2006,Fowler 2012, Eker ̊a-H ̊astad 2017, Eker ̊a 2017, Eker ̊a 2018, Gidney-Fowler 2019, Gidney 2019. We estimate the approximate cost of our construction using plausible physical assumptions for large-scale superconducting qubit platforms: a planar grid of qubits with nearest-neighbor connectivity,a characteristic physical gate error rate of 10−3, a surface code cycle time of 1 microsecond, and a reaction time of 10 microseconds. We account for factors that are normally ignored such as noise,the need to make repeated attempts, and the space time layout of the computation. When factoring2048 bit RSA integers, our construction’s space time volume is a hundredfold less than comparable estimates from earlier works (Fowler et al. 2012, Gheorghiu et al. 2019). In the abstract circuit model (which ignores overheads from distillation, routing, and error correction) our construction uses 3n+ 0.002nlgnlogical qubits, 0.3n3+ 0.0005n3lgnToffolis, and 500n2+n2lgnmeasurementdepth to factor n-bit RSA integers. We quantify the cryptographic implications of our work, both for RSA and for schemes based on the DLP in finite fields. Read More

#quantum

Real Time Computer Vision is Likely to be the Next Killer App but We’re Going to Need New Chips

Real Time Computer Vision (RTCV) that requires processing video DNNs at the edge is likely to be the next killer app that powers a renewed love affair with our mobile devices.  The problem is that current GPUs won’t cut it and we have to wait once again for the hardware to catch up.

The entire history of machine learning and artificial intelligence (AI/ML) has been a story about the race between techniques and hardware.  There have been times when we had the techniques but the hardware couldn’t keep up.  Conversely there have been times when hardware has outstripped technique.  Candidly though, it’s been mostly about waiting for the hardware to catch up.

You may not have thought about it, but we’re in one of those wait-for-tech hardware valleys right now.  Sure there’s lots of cloud based compute and ever faster GPU chips to make CNN and RNN work.  But the barrier that we’re up against is latency, particularly in computer vision.

If you want to utilize computer vision on your cell phone or any other edge device (did you ever think of self-driving cars as edge devices) then the data has to make the full round trip from your local camera to the cloud compute and back again before anything can happen. Read More

#vision

Moving Camera, Moving People: A Deep Learning Approach to Depth Prediction

The human visual system has a remarkable ability to make sense of our 3D world from its 2D projection. Even in complex environments with multiple moving objects, people are able to maintain a feasible interpretation of the objects’ geometry and depth ordering. The field of computer vision has long studied how to achieve similar capabilities by computationally reconstructing a scene’s geometry from 2D image data, but robust reconstruction remains difficult in many cases.

A particularly challenging case occurs when both the camera and the objects in the scene are freely moving. This confuses traditional 3D reconstruction algorithms that are based on triangulation, which assumes that the same object can be observed from at least two different viewpoints, at the same time. Satisfying this assumption requires either a multi-camera array (like Google’s Jump), or a scene that remains stationary as the single camera moves through it. As a result, most existing methods either filter out moving objects (assigning them “zero” depth values), or ignore them (resulting in incorrect depth values). Read More

#vision

AI in Five, Fifty and Five Hundred Years — Part One

Prediction is a tricky business.

You have to step outside of your own limitations, your own beliefs, your own flawed and fragmented angle on the world and see it from a thousand different perspectives. You have to see giant abstract patterns and filter through human nature, politics, technology, social dynamics, trends, statistics and probability.

It’s so mind-numbingly complex that our tiny little simian brains stand very little chance of getting it right. Even predicting the future five or ten years out is amazingly complicated. Read More

#artificial-intelligence

Human-level performance in 3D multiplayer games with population- based reinforcement learning

Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game,Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research. Read More

#reinforcement-learning

Speech2Face: Learning the Face Behind a Voice

How much can we infer about a person’s looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/YouTube videos of people speaking. During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity.This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos,without the need to model attributes explicitly. We evaluate and numerically quantify how—and in what manner—ourSpeech2Face reconstructions, obtained directly from audio,resemble the true face images of the speakers. Read More

#nlp

Radar-based Road User Classification and Novelty Detection with Recurrent Neural Network Ensembles

Radar-based road user classification is an important yet still challenging task towards autonomous driving applications. The resolution of conventional automotive radar sensors results in a sparse data representation which is tough to recover by subsequent signal processing. In this article,classifier ensembles originating from a one-vs-one binarization paradigm are enriched by one-vs-all correction classifiers. They are utilized to efficiently classify individual traffic participants and also identify hidden object classes which have not been presented to the classifiers during training. For each classifier of the ensemble an individual feature set is determined from a total set of 98 features. Thereby, the overall classification performance can be improved when compared to previous methods and, additionally, novel classes can be identified much more accurately. Furthermore, the proposed structure allows to give new insights in the importance of features for the recognition of individual classes which is crucial for the development of new algorithms and sensor requirements. Read More

#recurrent-neural-networks