Self-Taught AI Shows Similarities to How the Brain Works

Self-supervised learning allows a neural network to figure out for itself what matters. The process might be what makes our own brains so successful.

For a decade now, many of the most impressive artificial intelligence systems have been taught using a huge inventory of labeled data. An image might be labeled “tabby cat” or “tiger cat,” for example, to “train” an artificial neural network to correctly distinguish a tabby from a tiger. The strategy has been both spectacularly successful and woefully deficient.

Such “supervised” training requires data laboriously labeled by humans, and the neural networks often take shortcuts, learning to associate the labels with minimal and sometimes superficial information. For example, a neural network might use the presence of grass to recognize a photo of a cow, because cows are typically photographed in fields.

“We are raising a generation of algorithms that are like undergrads [who] didn’t come to class the whole semester and then the night before the final, they’re cramming,” said Alexei Efros, a computer scientist at the University of California, Berkeley. “They don’t really learn the material, but they do well on the test.” Read More

#human, #self-supervised

Transframer: Arbitrary Frame Prediction with Generative Models

We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data Read More

#big7, #image-recognition