Tensorflow 2 implementation of our high quality frame interpolation neural network. We present a unified single-network approach that doesn’t use additional pre-trained networks, like optical flow or depth, and yet achieve state-of-the-art results. We use a multi-scale feature extractor that shares the same convolution weights across the scales. Our model is trainable from frame triplets alone. Read More
Tag Archives: Image Recognition
Fake It Till You Make It
We demonstrate that it is possible to perform face-related computer vision in the wild using synthetic data alone.
The community has long enjoyed the benefits of synthesizing training data with graphics, but the domain gap between real and synthetic data has remained a problem, especially for human faces. Researchers have tried to bridge this gap with data mixing, domain adaptation, and domain-adversarial training, but we show that it is possible to synthesize data with minimal domain gap, so that models trained on synthetic data generalize to real in-the-wild datasets.
We describe how to combine a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity. We train machine learning systems for face-related tasks such as landmark localization and face parsing, showing that synthetic data can both match real data in accuracy as well as open up new approaches where manual labelling would be impossible. Read More
Dataset
Corsight’s Upcoming DNA to FACE: ‘Terrifying’ Warns Privacy Expert
Corsight plans to release a new product that combines DNA and face recognition technology and could have significant law enforcement and privacy implications.
In this report, we examine Corsight’s product roadmap for “DNA to FACE,” presented at the 2021 Imperial Capital Investors Conference, possible use cases for the technology, and warnings from a privacy expert.
IPVM collaborated with MIT Technology Review on this report, see the MIT Technology Review article: This company says it’s developing a system that can recognize your face from just your DNA Read More
TransformerFusion
Monocular RGB Scene Reconstruction using Transformers
We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation. Key to our approach is the transformer architecture that enables the network to learn to attend to the most relevant image frames for each 3D location in the scene, supervised only by the scene reconstruction task. Features are fused in a coarse-to-fine fashion, storing fine-level features only where needed, requiring lower memory storage and enabling fusion at interactive rates. The feature grid is then decoded to a higher-resolution scene reconstruction, using an MLP-based surface occupancy prediction from interpolated coarse-to-fine 3D features. Our approach results in an accurate surface reconstruction, outperforming state-of-the-art multi-view stereo depth estimation methods, fully-convolutional 3D reconstruction approaches, and approaches using LSTM- or GRU-based recurrent networks for video sequence fusion. Read More
Mobile-Former: Bridging MobileNet and Transformer
We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. This structure leverages the advantages of MobileNet at local processing and transformer at global interaction. And the bridge enables bidirectional fusion of local and global features. Different from recent works on vision transformer, the transformer in Mobile-Former contains very few tokens (e.g. 6 or fewer tokens) that are randomly initialized to learn global priors, resulting in low computational cost. Combining with the proposed light-weight cross attention to model the bridge, Mobile-Former is not only computationally efficient, but also has more representation power. It outperforms MobileNetV3 at low FLOP regime from 25M to 500M FLOPs on ImageNet classification. For instance, Mobile-Former achieves 77.9% top-1 accuracy at 294M FLOPs, gaining 1.3% over MobileNetV3 but saving 17% of computations. When transferring to object detection, Mobile-Former outperforms MobileNetV3 by 8.6 AP in RetinaNet framework. Furthermore, we build an efficient end-to-end detector by replacing backbone, encoder and decoder in DETR with Mobile-Former, which outperforms DETR by 1.1 AP but saves 52% of computational cost and 36% of parameters. Read More
#image-recognitionNvidia’s upgraded AI art tool turned my obscure squiggles into a masterpiece
It’s incredible, the things we can do with AI nowadays. For artists looking to integrate artificial intelligence into their workflow, there are ever more advanced tools popping up all over the net. One such tool is Nvidia Canvas, which has just been updated with the more powerful GauGAN2 AI, to replace the original GauGAN model, along with loads of new features.
The Nvidia Canvas software is available for free to anyone with an Nvidia RTX graphics card. This is because the software uses the tensor cores present in your GPU to let the AI do it’s job. Read More
Seoul Robotics Announces LiDAR Enabled Autonomous Logistics Platform
The task of transporting cars from the end of the assembly line to its final destination is currently a manual and expensive logistics problem. It includes loading and unloading of vehicles from the factory floor to trucks, ships and rail, with interim stops at parking lots. Seoul Robotics aims to change this. The company has just launched Level 5 Control Tower (LV5 CTRL TWR) system which BMW is leveraging to automate last-mile fleet logistics at their manufacturing facility in Munich.
The system uses SENSR™, a proprietary perception software powered by artificial intelligence (AI) algorithms. SENSR™ works in conjunction with a mesh network of computers and LiDAR sensors located on fixed infrastructure (light poles, roof overhangs, etc) that guides vehicles autonomously through a 5G communications network. Read More
OpenAI’s GLIDE Overtakes DALL-E
In a field with constant evolution, artificial intelligence news is starting to take a bigger share of my attention bandwidth. I’m really into breaking news.
OpenAI researchers this week presented GLIDE (Guided Language-to-Image Diffusion for Generation and Editing), a diffusion model that achieves performance competitive with DALL-E while using less than one-third of the parameters.
Text-to-image generation has been one of the most active and exciting AI fields of 2021. In January, OpenAI introduced DALL-E, a 12-billion parameter version of the company’s GPT-3 transformer language model designed to generate photorealistic images using text captions as prompts.
The GitHub of Glide went live on December 22nd, 2021. Sometimes breaking news in AI is actually worth talking about and I consider this such an occasion. Read More
Microsoft’s AI Understands Humans…But It Had Never Seen One!
AI-generated characters for supporting personalized learning and well-being
Advancements in machine learning have recently enabled the hyper-realistic synthesis of prose, images, audio and video data, in what is referred to as artificial intelligence (AI)-generated media. These techniques offer novel opportunities for creating interactions with digital portrayals of individuals that can inspire and intrigue us. AI-generated portrayals of characters can feature synthesized faces, bodies and voices of anyone, from a fictional character to a historical figure, or even a deceased family member. Although negative use cases of this technology have dominated the conversation so far, in this Perspective we highlight emerging positive use cases of AI-generated characters, specifically in supporting learning and well-being. We demonstrate an easy-to-use AI character generation pipeline to enable such outcomes and discuss ethical implications as well as the need for including traceability to help maintain trust in the generated media. As we look towards the future, we foresee generative media as a crucial part of the ever growing landscape of human–AI interaction. Read More