Alien Dreams: An Emerging Art Scene

In recent months there has been a bit of an explosion in the AI generated art scene.

Ever since OpenAI released the weights and code for their CLIP model, various hackers, artists, researchers, and deep learning enthusiasts have figured out how to utilize CLIP as a an effective “natural language steering wheel” for various generative models, allowing artists to create all sorts of interesting visual art merely by inputting some text – a caption, a poem, a lyric, a word – to one of these models.

For instance inputting “a cityscape at night” produces this cool, abstract-looking depiction of some city lights. Read More

#image-recognition, #nlp, #gans

Zero-Shot Detection via Vision and Language Knowledge Distillation

Zero-shot image classification has made promising progress by training the aligned image and text encoders. The goal of this work is to advance zero-shot object detection, which aims to detect novel objects without bounding box nor mask annotations. We propose ViLD, a training method via Vision and Language knowledge Distillation. We distill the knowledge from a pre-trained zero-shot image classification model (e.g., CLIP [33]) into a two-stage detector (e.g., Mask R-CNN [17]). Our method aligns the region embeddings in the detector to the text and image embeddings inferred by the pre-trained model. We use the text embeddings as the detection classifier, obtained by feeding category names into the pre-trained text encoder. We then minimize the distance between the region embeddings and image embeddings, obtained by feeding region proposals into the pre-trained image encoder. During inference, we include text embeddings of novel categories into the detection classifier for zero-shot detection. We benchmark the performance on LVIS dataset [15] by holding out all rare categories as novel categories. ViLD obtains 16.1 mask APr with a Mask R-CNN (ResNet-50 FPN) for zero-shot detection, outperforming the supervised counterpart by 3.8. The model can directly transfer to other datasets, achieving 72.2 AP50, 36.6 AP and 11.8 AP on PASCAL VOC, COCO and Objects365, respectively. Read More

#image-recognition, #nlp, #gans

Holly Herndon’s AI Deepfake “Twin” Holly+ Transforms Any Song Into a Holly Herndon Song

“Vocal deepfakes are here to stay. A balance needs to be found between protecting artists, and encouraging people to experiment with a new and exciting technology.”

Holly Herndon, a prominent voice on the cross-section between AI and the music industry who has prominently used AI in her music, has released a new voice instrument: her AI deepfake “twin,” Holly+. It’s a website where you can upload any polyphonic audio and have it transformed into a download of music sung in Herndon’s voice. Give it a try here and read more details on how it works here. Read More

#fake, #nlp