Toshiba Claims To Have Created World’s Most Accurate Visual Question Answering AI

Toshiba Corporation claims to have developed the world’s most accurate and highly versatile Visual Question Answering (VQA) AI that can recognise not only people and objects but also colours, shapes, appearances and background details in images.

The AI overcomes the difficulty of answering questions on the positioning and appearance of people and objects and possesses the ability to learn the information required to handle a wide range of questions and answers.

Toshiba presented the technology at ICANN2021, the international conference for neural networks, on 14 September. Read More

#image-recognition, #nlp

A way to spot computer-generated faces

A small team of researchers from The State University of New York at Albany, the State University of New York at Buffalo and Keya Medical has found a common flaw in computer-generated faces by which they can be identified. The group has written a paper describing their findings and have uploaded them to the arXiv preprint server.

…The researchers note that in many cases, users can simply zoom in on the eyes of a person they suspect may not be real to spot the pupil irregularities. They also note that it would not be difficult to write software to spot such errors and for social media sites to use it to remove such content. Unfortunately, they also note that now that such irregularities have been identified, the people creating the fake pictures can simply add a feature to ensure the roundness of pupils. Read More

#fake, #gans, #image-recognition

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements. Read More

#image-recognition, #self-supervised

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

The Super-Resolution Generative Adversarial Network (SRGAN) [1] is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN [2] to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge [3]. Read More

#gans, #image-recognition

Warner Bros. ‘Reminiscence’ promo uses deepfake tech to put you in the trailer

If you want to see yourself on screen with Hugh Jackman, this is your chance. The promo for Warner Bros. upcoming Reminiscence movie uses deepfake technology to turn a photo of your face — or anybody’s face, really — into a short video sequence with the star. According to Protocol, a media startup called D-ID created the promo for the film. D-ID reportedly started out wanting to develop technology that can protect consumers against facial recognition, but then it realized that its tech could also be used to optimize deepfakes.

For this particular project, the firm created a website for the experience, where you’ll be asked for your name and for a photo. You can upload the photo of anybody you want, and the experience will then conjure up an animation for the face in it. The animation isn’t perfect by any means, and the face could look distorted at times, but it’s still not bad, considering the technology created it from a single picture.  Read More

#gans, #image-recognition

#fake

Researchers Create ‘Master Faces’ to Bypass Facial Recognition

Researchers have demonstrated a method to create “master faces,” computer generated faces that act like master keys for facial recognition systems, and can impersonate several identities with what the researchers claim is a high probability of success. 

In their paper, researchers at the Blavatnik School of Computer Science and the School of Electrical Engineering in Tel Aviv detail how they successfully created nine “master key” faces that are able to impersonate almost half the faces in a dataset of three leading face recognition systems. The researchers say their results show these master faces can successfully impersonate over 40 percent of the population in these systems without any additional information or data of the person they are identifying.  Read More

#image-recognition, #fake, #gans

Why CAPTCHA Pictures Are So Unbearably Depressing

I hate doing Google’s CAPTCHAs.

Part of it is the sheer hassle of repeatedly identifying objects — traffic lights, staircases, palm trees and buses — just so I can finish a web search. I also don’t like being forced to donate free labor to AI companies to help train their visual-recognition systems.

But a while ago, while numbly clicking on grainy images of fire hydrants, I was struck by another reason:

The images are deeply, overwhelmingly depressing. Read More

#image-recognition

NeRF-VAE: A Geometry Aware 3D Scene Generative Model

We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via Neural Radiance Fields (NeRF) and differentiable volume rendering. In contrast to NeRF, our model takes into account shared structure across scenes, and is able to infer the structure of a novel scene— without the need to re-train—using amortized inference. NeRF-VAE’s explicit 3D rendering process further contrasts previous generative models with convolution-based rendering which lacks geometric structure. Our model is a VAE that learns a distribution over radiance fields by conditioning them on a latent scene representation. We show that, once trained, NeRF-VAE is able to infer and render geometrically-consistent scenes from previously unseen 3D environments using very few input images. We further demonstrate that NeRF-VAE generalizes well to out-of-distribution cameras, while convolutional models do not. Finally, we introduce and study an attention-based conditioning mechanism of NeRF-VAE’s decoder, which improves model performance. Read More

#image-recognition

Pre-trained deep learning imagery models update (July 2021)

The amount of imagery that’s collected and disseminated has increased by orders of magnitude over the past couple of years. Deep learning has been instrumental in efficiently extracting and deriving meaningful insights from these massive amounts of imagery. Last October, we released pre-trained geospatial deep learning models, making deep learning more approachable and accessible to a wide spectrum of users.

These models have been pre-trained by Esri on large volumes of data, and can be used as-is, or further fine tuned to your local geography, objects of interest or type of imagery. You no longer need huge volumes of training data and imagery, massive compute resources, or the expertise to train such models yourself. With the pre-trained models, you can bring in the raw data or imagery and extract geographical features at the click of a button. Read More

#image-recognition

Scientists adopt deep learning for multi-object tracking

Their novel framework achieves state-of-the-art performance without sacrificing efficiency in public surveillance tasks

Implementing algorithms that can simultaneously track multiple objects is essential to unlock many applications, from autonomous driving to advanced public surveillance. However, it is difficult for computers to discriminate between detected objects based on their appearance. Now, researchers at the Gwangju Institute of Science and Technology (GIST) have adapted deep learning techniques in a multi-object tracking framework, overcoming short-term occlusion and achieving remarkable performance without sacrificing computational speed. Read More

Read the Paper

#image-recognition, #deep-learning