Generative Adversarial Transformers

We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. It iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network. We demonstrate the model’s strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and out-door scenes, showing it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency. Further qualitative and quantitative experiments offer us an insight into the model’s inner workings, revealing improved interpretability and stronger disentanglement, and illustrating the benefits and efficacy of our approach. Read More

#gans

How Audio Pros ‘Upmix’ Vintage Tracks and Give Them New Life

Experts are using AI to pick apart classic recordings from the 50s and 60s, isolate the instruments, and stitch them back together in crisp, bold ways.

When James Clarke went to work at London’s legendary Abbey Road Studios in late 2009, he wasn’t an audio engineer. He’d been hired to work as a software programmer. One day not long after he started, he was having lunch with several studio veterans of the 1960s and ’70s, the pre-computer era of music recording when songs were captured on a single piece of tape. To make conversation, Clarke asked a seemingly innocent question: Could you take a tape from the days before multitrack recording and isolate the individual instruments? Could you pull it apart?

The engineers shot him down. It turned into “several hours of the ins and outs of why it’s not possible,” Clarke remembers. You could perform a bit of sonic trickery to transform a song from one-channel mono to two-channel stereo, but that didn’t interest him. Clarke was seeking something more exacting: a way to pick apart a song so a listener could hear just one element at a time. Maybe just the guitar, maybe the drums, maybe the singer.

“I kept saying to them that if the human ear can do it, we can write software to do it as well,” he says. To him, this was a challenge. “I’m from New Zealand. We love proving people wrong.” Read More

#human

A Study of Face Obfuscation in ImageNet

Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however, many incidental people appear in the images, and their privacy is a concern. We first annotate faces in the dataset. Then we demonstrate that face blurring—a typical obfuscation technique—has minimal impact on the accuracy of recognition models. Concretely, we benchmark multiple deep neural networks on face-blurred images and observe that the overall recognition accuracy drops only slightly (≤0.68%). Further,we experiment with transfer learning to 4 downstream tasks (object recognition, scene recognition, face attribute classification, and object detection) and show that features learned on face-blurred images are equally transferable. Our work demonstrates the feasibility of privacy-aware visual recognition, improves the highly-used ImageNet challenge benchmark,and suggests an important path for future visual datasets. Read More

#accuracy, #image-recognition

IEEE 802.11bf: Toward Ubiquitous Wi-Fi Sensing

Wi-Fi is among the most successful wireless technologies ever invented. As Wi-Fi becomes more and more present in public and private spaces, it becomes natural to leverage its ubiquitousness to implement ground-breaking wireless sensing applications such as human presence detection, activity recognition, and object tracking, just to name a few. This paper reports ongoing efforts by the IEEE 802.11bf Task Group (TGbf), which is defining the appropriate modifications to existing Wi-Fi standards to enhance sensing capabilities through 802.11-compliant waveforms. We summarize objectives and timeline of TGbf, and discuss some of the most interesting proposed technical features discussed so far. We also introduce a roadmap of research challenges pertaining to Wi-Fi sensing and its integration with future Wi-Fi technologies and emerging spectrum bands, hoping to elicit further activities by both the research community and TGbf. Read More

#surveillance, #wifi

Unitree Robotics: Tustice!

Read More

#robotics, #videos

Your Local Police Department Might Have Used This Facial Recognition Tool To Surveil You. Find Out Here.

Search through BuzzFeed News’ database to find out if the police department in your community is among the hundreds of taxpayer-funded entities that used Clearview AI’s facial recognition.

Clearview AI has created a powerful facial recognition tool and marketed it to police departments and government agencies. The company has never disclosed the entities that have used its facial recognition software, but a confidential source provided BuzzFeed News with data that appeared to be a list of agencies and companies whose employees have tried or used its technology. Read More

#surveillance

Can AI read your emotions? Try it for yourself

Emotion recognition AI is bunk.

Don’t get me wrong, AI that recognizes human sentiment and emotion can be very useful. For example, it can help identify when drivers are falling asleep behind the wheel. But what it cannot do, is discern how a human being is actually feeling by the expression on their face.

You don’t have to take my word for it, you can try it yourself here. Read More

#image-recognition

The Lottery Ticket Hypothesis That Shocked The World

In machine learning, bigger may not always be better. As the datasets and the machine learning models keep expanding, researchers are racing to build state-of-the-art benchmarks. However, larger models can be detrimental to the budget and the environment.

Over time, researchers have developed several ways to shrink the deep learning models while optimizing training datasets. In particular, three techniques–pruning, quantization, and transfer learning–have been instrumental in making models run faster and more accurately at lesser compute power.

In a 2019 study, Lottery Ticket Hypothesis, MIT researchers showed it was possible to remove a few unnecessary connections in neural networks and still achieve good or even better accuracy. Read More

#accuracy, #performance

ACLU Files AI FOIA Request

The American Civil Liberties Union (ACLU) has filed a Freedom of Information Act (FOIA) request to find out how America’s national security and intelligence agencies are using artificial intelligence (AI).

The ACLU said it made the request out of concern that AI was being used in ways that could violate Americans’ civil rights.

The request follows the March 1 release of a 16-chapter report containing recommendations on how AI, machine learning, and associated technologies should be used by the Biden administration.  Read More

#ic, #surveillance

Sofia Robot Has a Third Sister ! AGAIN

Read More
#robotics, #videos