Going Beyond GAN? New DeepMind VAE Model Generates High Fidelity Human

Generative adversarial networks (GANs) have become AI researchers’ “go-to” technique for generating photo-realistic synthetic images. Now, DeepMind researchers say that there may be a better option.

In a new paper, the Google-owned research company introduces its VQ-VAE 2 model for large scale image generation. The model is said to yield results competitive with state-of-the-art generative model BigGAN in synthesizing high-resolution images while delivering broader diversity and overcoming some native shortcomings of GANs. Read More

#deep-learning, #gans, #image-recognition

Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer

Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering. Enabling ML models to understand image formation might be key for generalization. However, due to an essential rasterization step involving discrete assignment operations, rendering pipelines are non-differentiable and thus largely inaccessible to gradient-based ML techniques. In this paper, we present DIB-R, a differentiable rendering framework which allows gradients to be analytically computed for all pixels in an image. Key to our approach is to view foreground rasterization as a weighted interpolation of local properties and background rasterization as an distance-based aggregation of global geometry. Our approach allows for accurate optimization over vertex positions, colors, normals, light directions and texture coordinates through a variety of lighting models. We showcase our approach in two ML applications: single-image 3D object prediction, and 3D textured object generation, both trained using exclusively using 2D supervision. Our project website is: https://nv-tlabs.github.io/DIB-R/ Read More

#image-recognition, #vfx

This is how Facebook’s AI looks for bad stuff

The vast majority of Facebook’s moderation is now done automatically by the company’s machine-learning systems, reducing the amount of harrowing content its moderators have to review. In its latest community standards enforcement report, published earlier this month, the company claimed that 98% of terrorist videos and photos are removed before anyone has the chance to see them, let alone report them. Read More

#cyber, #image-recognition, #nlp

First-ever humanoid robot powered by cloud artificial intelligence

Who needs to use that delicate tiny sewing staple, when there’s now a robot that can thread a needle for you? CloudMinds XR-1, 5G Humanoid Robots with vision-controlled grasping tech and intricate manual tasks, interacted with guests at the Sprint exhibit at the Mobile World Congress 2019 Los Angeles, (MWC19)  in Los Angeles. 

The XR-1 robot is powered by cloud artificial intelligence  (AI)–one of the first of its kind–Sprint True Mobile 5G, and proprietary vision-controlled grasping tech, which means it not only can thread a needle, but can serve drinks and can be programmed to do other tasks, including manufacturing. Read More

#image-recognition, #iot, #robotics

We See in 3D – So Should Our CNN Models

Summary: Autonomous vehicles (AUVs) and many other systems that need to accurately perceive the world around them will be much better off when image classification moves from 2D to 3D.  Here we examine the two leading approaches to 3D classification, Point Clouds and Voxel Grids.

One of the well-known problems in CNN image classification is that because the CNN classifier sees only a 2D image of the object it won’t recognize that same object if it’s rotated.  The solution thus far has been to train on many different orthogonal views of the same object and that vastly expands the problem of training data and training time. Read More

#human, #image-recognition

These Machine Learning Techniques Make Google Lens A Success

Google Lens was introduced a couple of years ago by Google in a move to spearhead the ‘AI first’ products movement. Now, with the enhancement of machine learning techniques, especially in the domain of image processing and NLP, Google Lens has scaled to new heights. Here we take a look at a few algorithmic based solutions that power up Google Lens:

Lens uses computer vision, machine learning and Google’s Knowledge Graph to let people turn the things they see in the real world into a visual search box, enabling them to identify objects like plants and animals, or to copy and paste text from the real world into their phone. Read More

#big7, #image-recognition, #nlp

Cutting-edge research promises to imbue AI with contextual knowledge

Viewing scenes and making sense of them is something people do effortlessly every day. Whether it’s sussing out objects’ colors or gauging their distances apart, it doesn’t take much conscious effort to recognize items’ attributes and apply knowledge to answer questions about them.

That’s patently untrue of most AI systems, which tend to reason rather poorly. But emerging techniques in visual recognition, language understanding, and symbolic program execution promise to imbue them with the ability to generalize to new examples, much like humans. Read More

#image-recognition, #vfx

Efficient Video Generation on Complex Datasets

Generative models of natural images have progressed towards high fidelity samples by the strong leveraging of scale. We attempt to carry this success to the field of video modeling by showing that large Generative Adversarial Networks trained on the complex Kinetics-600 dataset are able to produce video samples of substantially higher complexity than previous work. Our proposed model, Dual Video Discriminator GAN (DVD-GAN), scales to longer and higher resolution videos by leveraging a computationally efficient decomposition of its discriminator. We evaluate on the related tasks of video synthesis and video prediction, and achieve new state of the art Fréchet Inception Distance on prediction for Kinetics-600,as well as state of the art Inception Score for synthesis on the UCF-101 dataset,alongside establishing a strong baseline for synthesis on Kinetics-600. Read More

#gans, #image-recognition

The AI Renaissance portrait generator isn't great at painting people of color

Surprise! Artificial intelligence-generated portraits based off artwork from 15th century Europe… kind of suck at depicting people of color.

Because we’re apparently always ready to hand over our photos for the sake of a trend, the internet’s current obsession is an AI portrait generator that deconstructs your selfies and rebuilds them as Renaissance and Baroque portraits.

Created by researchers at the MIT-IBM Watson AI Lab, AI Portrait Ars is a fun way to see how you would have been perceived if you lived in another time period.

“Portraits interpret the external beauty, social status, and then go beyond our body and face,” its creators wrote in the site’s “Why” section. “A portrait becomes a psychological analysis and a deep reflection on our existence.”

Unless, apparently, you’re not white.  Read More

#bias, #image-recognition

Facing your AI self at the ‘Neural Mirror’ art installation

Italian design studio Ultravioletto has created a mirror that lets you see yourself the way corporations see you: as a collection of data points. At first, the Neural Mirror installation (located at a former church in the Italian city of Spoleto), seems like an ordinary mirror. But after you’ve been duly scanned and processed (with the system estimating your age, sex and emotional state) you’ll quickly see something else; a ghostly vision of a machine’s idea of who you are. Read More

#image-recognition