Best of arXiv — February 2022

A monthly selection of ML papers by Zeta Alpha: Reinforcement Learning, Multimodality, Language Models as a service, Computer Vision, Information Retrieval and more. Read More

#machine-learning

A replay of life: What happens in our brain when we die?

Neuroscientists have recorded the activity of a dying human brain and discovered rhythmic brain wave patterns around the time of death that are similar to those occurring during dreaming, memory recall, and meditation. Now, a study published to Frontiers brings new insight into a possible organizational role of the brain during death and suggests an explanation for vivid life recall in near-death experiences.

Imagine reliving your entire life in the space of seconds. Like a flash of lightning, you are outside of your body, watching memorable moments you lived through. This process, known as ‘life recall’, can be similar to what it’s like to have a near-death experience. What happens inside your brain during these experiences and after death are questions that have puzzled neuroscientists for centuries. However, a new study published to Frontiers in Aging Neuroscience suggests that your brain may remain active and coordinated during and even after the transition to death, and be programmed to orchestrate the whole ordeal.

When an 87-year-old patient developed epilepsy, Dr Raul Vicente of the University of Tartu, Estonia and colleagues used continuous electroencephalography (EEG) to detect the seizures and treat the patient. During these recordings, the patient had a heart attack and passed away. This unexpected event allowed the scientists to record the activity of a dying human brain for the first time ever. Read More

#human

People Trust Deepfake Faces Generated by AI More Than Real Ones, Study Finds

The proliferation of deepfake technology is raising concerns that AI could start to warp our sense of shared reality. New research suggests AI-synthesized faces don’t simply dupe us into thinking they’re real people, we actually trust them more than our fellow humans.

In 2018, Nvidia wowed the world with an AI that could churn out ultra-realistic photos of people that don’t exist. Its researchers relied on a type of algorithm known as a generative adversarial network (GAN), which pits two neural networks against each other, one trying to spot fakes and the other trying to generate more convincing ones. Given enough time, GANS can generate remarkably good counterfeits.

Since then, capabilities have improved considerably, with some worrying implications: enabling scammers to trick people, making it possible to splice people into porn movies without their consent, and undermining trust in online media. While it’s possible to use AI itself to spot deepfakes, tech companies’ failures to effectively moderate much less complicated material suggests this won’t be a silver bullet. Read More

#fake, #image-recognition

ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

Encrypted traffic classification requires discriminative and robust traffic representation captured from content-invisible and imbalanced traffic data for accurate classification, which is challenging but indispensable to achieve network security and network management. The major limitation of existing solutions is that they highly rely on the deep features, which are overly dependent on data size and hard to generalize on unseen data. How to leverage the open-domain unlabeled traffic data to learn representation with strong generalization ability remains a key challenge. In this paper, we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks, remarkably pushing the F1 of ISCX-VPN-Service to 98.9% (5.2%↑), Cross-Platform (Android) to 92.5% (5.4%↑), CSTNET-TLS 1.3 to 97.4% (10.0%↑). Notably, we provide explanation of the empirically powerful pre-training model by analyzing the randomness of ciphers. It gives us insights in understanding the boundary of classification ability over encrypted traffic. The code is available at: https://github.com/linwhitehat/ET-BERT Read More

#nlp

FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding

The few-shot natural language understanding (NLU) task has attracted much recent attention. However, prior methods have been evaluated under a disparate set of protocols, which hinders fair comparison and measuring progress of the field. To address this issue, we introduce an evaluation framework that improves previous evaluation procedures in three key aspects, i.e., test performance, dev test correlation, and stability. Under this new evaluation framework, we re-evaluate several state-of-the-art few-shot methods for NLU tasks. Our framework reveals new insights: (1) both the absolute performance and relative gap of the methods were not accurately estimated in prior literature; (2) no single method dominates most tasks with consistent performance; (3) improvements of some methods diminish with a larger pretrained model; and (4) gains from different methods are often complementary and the best combined model performs close to a strong fully-supervised baseline. We open-source our toolkit, FewNLU, that implements our evaluation framework along with a number of state-of-the-art methods. Read More

#nlp, #performance