GAN Objective Functions: GANs and Their Variations

There are hundreds of types of GANs. How does an objective function play into what a GAN looks like?

If you haven’t already, you should definitely read my previous post about what a GAN is (especially if you don’t know what I mean when I say GAN!). That post should give you a starting point to dive into the world of GANs and how they work. It’s a solid primer for any article on GANs, not to mention this one where we will be discussing objective functions of GANs and other variations of GANs currently out there that use twists on defining their objectives for different results. Read More

#gans

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

The Super-Resolution Generative Adversarial Network (SRGAN) [1] is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN [2] to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge [3]. Read More

#gans, #image-recognition

Warner Bros. ‘Reminiscence’ promo uses deepfake tech to put you in the trailer

If you want to see yourself on screen with Hugh Jackman, this is your chance. The promo for Warner Bros. upcoming Reminiscence movie uses deepfake technology to turn a photo of your face — or anybody’s face, really — into a short video sequence with the star. According to Protocol, a media startup called D-ID created the promo for the film. D-ID reportedly started out wanting to develop technology that can protect consumers against facial recognition, but then it realized that its tech could also be used to optimize deepfakes.

For this particular project, the firm created a website for the experience, where you’ll be asked for your name and for a photo. You can upload the photo of anybody you want, and the experience will then conjure up an animation for the face in it. The animation isn’t perfect by any means, and the face could look distorted at times, but it’s still not bad, considering the technology created it from a single picture.  Read More

#gans, #image-recognition

#fake

Researchers Create ‘Master Faces’ to Bypass Facial Recognition

Researchers have demonstrated a method to create “master faces,” computer generated faces that act like master keys for facial recognition systems, and can impersonate several identities with what the researchers claim is a high probability of success. 

In their paper, researchers at the Blavatnik School of Computer Science and the School of Electrical Engineering in Tel Aviv detail how they successfully created nine “master key” faces that are able to impersonate almost half the faces in a dataset of three leading face recognition systems. The researchers say their results show these master faces can successfully impersonate over 40 percent of the population in these systems without any additional information or data of the person they are identifying.  Read More

#image-recognition, #fake, #gans

Generating Master Faces for Dictionary Attacks with a Network-Assisted Latent Space Evolution

A master face is a face image that passes face based identity-authentication for a large portion of the population. These faces can be used to impersonate, with a high probability of success, any user, without having access to
any user-information. We optimize these faces, by using an evolutionary algorithm in the latent embedding space of the StyleGAN face generator. Multiple evolutionary strategies are compared, and we propose a novel approach that employs a neural network in order to direct the search in the direction of promising samples, without adding fitness evaluations. The results we present demonstrate that it is possible to obtain a high coverage of the population (over 40%) with less than 10 master faces, for three leading deep face recognition systems. Read More

#fake, #gans, #cyber

Alien Dreams: An Emerging Art Scene

In recent months there has been a bit of an explosion in the AI generated art scene.

Ever since OpenAI released the weights and code for their CLIP model, various hackers, artists, researchers, and deep learning enthusiasts have figured out how to utilize CLIP as a an effective “natural language steering wheel” for various generative models, allowing artists to create all sorts of interesting visual art merely by inputting some text – a caption, a poem, a lyric, a word – to one of these models.

For instance inputting “a cityscape at night” produces this cool, abstract-looking depiction of some city lights. Read More

#image-recognition, #nlp, #gans

Zero-Shot Detection via Vision and Language Knowledge Distillation

Zero-shot image classification has made promising progress by training the aligned image and text encoders. The goal of this work is to advance zero-shot object detection, which aims to detect novel objects without bounding box nor mask annotations. We propose ViLD, a training method via Vision and Language knowledge Distillation. We distill the knowledge from a pre-trained zero-shot image classification model (e.g., CLIP [33]) into a two-stage detector (e.g., Mask R-CNN [17]). Our method aligns the region embeddings in the detector to the text and image embeddings inferred by the pre-trained model. We use the text embeddings as the detection classifier, obtained by feeding category names into the pre-trained text encoder. We then minimize the distance between the region embeddings and image embeddings, obtained by feeding region proposals into the pre-trained image encoder. During inference, we include text embeddings of novel categories into the detection classifier for zero-shot detection. We benchmark the performance on LVIS dataset [15] by holding out all rare categories as novel categories. ViLD obtains 16.1 mask APr with a Mask R-CNN (ResNet-50 FPN) for zero-shot detection, outperforming the supervised counterpart by 3.8. The model can directly transfer to other datasets, achieving 72.2 AP50, 36.6 AP and 11.8 AP on PASCAL VOC, COCO and Objects365, respectively. Read More

#image-recognition, #nlp, #gans

VideoGPT: Video Generation using VQ-VAE and Transformers

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns down sampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings. Despite the simplicity in formulation and ease of training, our architecture is able to generate samples competitive with state-of-the-art GAN models for video generation on the BAIR Robot dataset, and generate high fidelity natural images from UCF-101 and Tumbler GIF Dataset (TGIF). We hope our proposed architecture serves as a reproducible reference for a minimalistic implementation of transformer based video generation models. Read More

#gans, #image-recognition

NVIDIA’s Canvas app turns doodles into AI-generated ‘photos’

NVIDIA has launched a new app you can use to paint life-like landscape images — even if you have zero artistic skills and a first grader can draw better than you. The new application is called Canvas, and it can turn childlike doodles and sketches into photorealistic landscape images in real time. It’s now available for download as a free beta, though you can only use it if your machine is equipped with an NVIDIA RTX GPU.

Canvas is powered by the GauGAN AI painting tool, which NVIDIA Research developed and trained using 5 million images. Read More

#gans, #image-recognition

Reward is enough

In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence. Read More

#gans, #reinforcement-learning