Generating Animations From Audio With NVIDIA’s Deep Learning Tech

Check out a tool in beta called Omniverse Audio2Face that lets you quickly generate new animations.

In case you missed the news, NVIDIA has a tool in beta that lets you quickly and easily generate expressive facial animation from just an audio source using the team’s deep learning-based technology. The Audio2Face tool allows users to simplify the animation of 3D characters for a game, film, real-time digital assistants, and other projects. The toolkit lets you run the results live or bake them out.  Read More

#audio, #image-recognition

Designing effective traditional and deep learning-based inspection systems for machine vision applications

When best practices are followed, machine vision and deep learning-based imaging systems are capable of effective visual inspection and will improve efficiency, increase throughput, and drive revenue.

For decades, machine vision technology has performed automated inspection tasks—including defect detection, flaw analysis, assembly verification, sorting, and counting—in industrial settings. Recent computer vision software advances and processing techniques have further enhanced the capabilities of these imaging systems in new and expanding uses. The imaging system itself remains a critically important vision component, yet its role and execution can be underestimated or misunderstood.

Without a well-designed and properly installed imaging system, software will struggle to reliably detect defects. For example, even though the imaging setup in Figure 1 (left) displays an attractive image of a gear, only the image on the right clearly shows a dent. When best practices are followed, machine vision and deep learning-based imaging systems are capable of effective visual inspection and will improve efficiency, increase throughput, and drive revenue. This article takes an in-depth dive into the best practices for iterative design and provides a roadmap for success for designing each type of system. Read More

#image-recognition

Google AI Introduces ‘WIT’, A Wikipedia-Based Image Text Dataset For Multimodal Multilingual Machine Learning

Image and text datasets are widely used in many machine learning applications. To model the relationship between images and text, most multimodal Visio-linguistic models today rely on large datasets. Historically, these datasets were created by either manually captioning images or crawling the web and extracting the alt-text as the caption. While the former method produces higher-quality data, the intensive manual annotation process limits the amount of data produced. The automated extraction method can result in larger datasets. However, it requires either heuristics and careful filtering to ensure data quality or scaling-up models to achieve robust performance. 

To overcome these limitations, Google research team created a high-quality, large-sized, multilingual dataset called the Wikipedia-Based Image Text (WIT) Dataset. It is created by extracting multiple text selections associated with an image from Wikipedia articles and Wikimedia image links. Read More

#big7, #image-recognition

GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)

We show how to learn a map that takes a content code, derived from a face image, and a randomly chosen style code to an anime image. We derive an adversarial loss from our simple and effective definitions of style and content. This adversarial loss guarantees the map is diverse – a very wide range of anime can be produced from a single content code. Under plausible assumptions, the map is not just diverse, but also correctly represents the probability of an anime, conditioned on an input face. In contrast, current multimodal generation procedures cannot capture the complex styles that appear in anime. Extensive quantitative experiments support the idea the map is correct. Extensive qualitative results show that the method can generate a much more diverse range of styles than SOTA comparisons. Finally, we show that our formalization of content and style allows us to perform video to video translation without ever training on videos Read More

#gans, #image-recognition

Toshiba Claims To Have Created World’s Most Accurate Visual Question Answering AI

Toshiba Corporation claims to have developed the world’s most accurate and highly versatile Visual Question Answering (VQA) AI that can recognise not only people and objects but also colours, shapes, appearances and background details in images.

The AI overcomes the difficulty of answering questions on the positioning and appearance of people and objects and possesses the ability to learn the information required to handle a wide range of questions and answers.

Toshiba presented the technology at ICANN2021, the international conference for neural networks, on 14 September. Read More

#image-recognition, #nlp

A way to spot computer-generated faces

A small team of researchers from The State University of New York at Albany, the State University of New York at Buffalo and Keya Medical has found a common flaw in computer-generated faces by which they can be identified. The group has written a paper describing their findings and have uploaded them to the arXiv preprint server.

…The researchers note that in many cases, users can simply zoom in on the eyes of a person they suspect may not be real to spot the pupil irregularities. They also note that it would not be difficult to write software to spot such errors and for social media sites to use it to remove such content. Unfortunately, they also note that now that such irregularities have been identified, the people creating the fake pictures can simply add a feature to ensure the roundness of pupils. Read More

#fake, #gans, #image-recognition

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements. Read More

#image-recognition, #self-supervised

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

The Super-Resolution Generative Adversarial Network (SRGAN) [1] is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN [2] to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge [3]. Read More

#gans, #image-recognition

Warner Bros. ‘Reminiscence’ promo uses deepfake tech to put you in the trailer

If you want to see yourself on screen with Hugh Jackman, this is your chance. The promo for Warner Bros. upcoming Reminiscence movie uses deepfake technology to turn a photo of your face — or anybody’s face, really — into a short video sequence with the star. According to Protocol, a media startup called D-ID created the promo for the film. D-ID reportedly started out wanting to develop technology that can protect consumers against facial recognition, but then it realized that its tech could also be used to optimize deepfakes.

For this particular project, the firm created a website for the experience, where you’ll be asked for your name and for a photo. You can upload the photo of anybody you want, and the experience will then conjure up an animation for the face in it. The animation isn’t perfect by any means, and the face could look distorted at times, but it’s still not bad, considering the technology created it from a single picture.  Read More

#gans, #image-recognition

#fake

Researchers Create ‘Master Faces’ to Bypass Facial Recognition

Researchers have demonstrated a method to create “master faces,” computer generated faces that act like master keys for facial recognition systems, and can impersonate several identities with what the researchers claim is a high probability of success. 

In their paper, researchers at the Blavatnik School of Computer Science and the School of Electrical Engineering in Tel Aviv detail how they successfully created nine “master key” faces that are able to impersonate almost half the faces in a dataset of three leading face recognition systems. The researchers say their results show these master faces can successfully impersonate over 40 percent of the population in these systems without any additional information or data of the person they are identifying.  Read More

#image-recognition, #fake, #gans