We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. This structure leverages the advantages of MobileNet at local processing and transformer at global interaction. And the bridge enables bidirectional fusion of local and global features. Different from recent works on vision transformer, the transformer in Mobile-Former contains very few tokens (e.g. 6 or fewer tokens) that are randomly initialized to learn global priors, resulting in low computational cost. Combining with the proposed light-weight cross attention to model the bridge, Mobile-Former is not only computationally efficient, but also has more representation power. It outperforms MobileNetV3 at low FLOP regime from 25M to 500M FLOPs on ImageNet classification. For instance, Mobile-Former achieves 77.9% top-1 accuracy at 294M FLOPs, gaining 1.3% over MobileNetV3 but saving 17% of computations. When transferring to object detection, Mobile-Former outperforms MobileNetV3 by 8.6 AP in RetinaNet framework. Furthermore, we build an efficient end-to-end detector by replacing backbone, encoder and decoder in DETR with Mobile-Former, which outperforms DETR by 1.1 AP but saves 52% of computational cost and 36% of parameters. Read More
#image-recognitionTag Archives: Image Recognition
Nvidia’s upgraded AI art tool turned my obscure squiggles into a masterpiece
It’s incredible, the things we can do with AI nowadays. For artists looking to integrate artificial intelligence into their workflow, there are ever more advanced tools popping up all over the net. One such tool is Nvidia Canvas, which has just been updated with the more powerful GauGAN2 AI, to replace the original GauGAN model, along with loads of new features.
The Nvidia Canvas software is available for free to anyone with an Nvidia RTX graphics card. This is because the software uses the tensor cores present in your GPU to let the AI do it’s job. Read More
Seoul Robotics Announces LiDAR Enabled Autonomous Logistics Platform
The task of transporting cars from the end of the assembly line to its final destination is currently a manual and expensive logistics problem. It includes loading and unloading of vehicles from the factory floor to trucks, ships and rail, with interim stops at parking lots. Seoul Robotics aims to change this. The company has just launched Level 5 Control Tower (LV5 CTRL TWR) system which BMW is leveraging to automate last-mile fleet logistics at their manufacturing facility in Munich.
The system uses SENSR™, a proprietary perception software powered by artificial intelligence (AI) algorithms. SENSR™ works in conjunction with a mesh network of computers and LiDAR sensors located on fixed infrastructure (light poles, roof overhangs, etc) that guides vehicles autonomously through a 5G communications network. Read More
OpenAI’s GLIDE Overtakes DALL-E
In a field with constant evolution, artificial intelligence news is starting to take a bigger share of my attention bandwidth. I’m really into breaking news.
OpenAI researchers this week presented GLIDE (Guided Language-to-Image Diffusion for Generation and Editing), a diffusion model that achieves performance competitive with DALL-E while using less than one-third of the parameters.
Text-to-image generation has been one of the most active and exciting AI fields of 2021. In January, OpenAI introduced DALL-E, a 12-billion parameter version of the company’s GPT-3 transformer language model designed to generate photorealistic images using text captions as prompts.
The GitHub of Glide went live on December 22nd, 2021. Sometimes breaking news in AI is actually worth talking about and I consider this such an occasion. Read More
Microsoft’s AI Understands Humans…But It Had Never Seen One!
AI-generated characters for supporting personalized learning and well-being
Advancements in machine learning have recently enabled the hyper-realistic synthesis of prose, images, audio and video data, in what is referred to as artificial intelligence (AI)-generated media. These techniques offer novel opportunities for creating interactions with digital portrayals of individuals that can inspire and intrigue us. AI-generated portrayals of characters can feature synthesized faces, bodies and voices of anyone, from a fictional character to a historical figure, or even a deceased family member. Although negative use cases of this technology have dominated the conversation so far, in this Perspective we highlight emerging positive use cases of AI-generated characters, specifically in supporting learning and well-being. We demonstrate an easy-to-use AI character generation pipeline to enable such outcomes and discuss ethical implications as well as the need for including traceability to help maintain trust in the generated media. As we look towards the future, we foresee generative media as a crucial part of the ever growing landscape of human–AI interaction. Read More
DeepRoute.ai Offers a Production-Ready L4 Autonomous Driving System at a Cool $10,000
Autonomous driving is considered to be the holy grail of the automotive industry and has been promised to us for quite a long time already. If I recall the slides from a 2013 Bosch presentation, we should’ve been all passengers in our cars a year ago. Back then, seven years seemed like a reasonable time frame but, health crisis aside, we are nowhere near fully-autonomous driving, or Level 5 (L5) autonomy as the industry calls it.
Sure, Tesla calls its assistance suite “Autopilot” or even “Full Self-Driving,” but it’s just a deceptive trade name for a system that is only capable of L2 autonomy. This means that the car cannot be trusted with your life and Tesla does not assume responsibility for whatever mischiefs the car might be doing. Read More
Synthesia raises $50M to leverage synthetic avatars for corporate training and more
Because every doc should be a presentation, and every presentation should be a video?
Synthesia, a startup using AI to create synthetic videos, is walking a fine, but thus far prosperous, line between being creepy and being pretty freakin’ cool.
…Synthesia allows anyone to turn text or a slide deck presentation into a video, complete with a talking avatar. Customers can leverage existing avatars, created from the performance of actors, or create their own in minutes by uploading some video. Users also can upload a recording of their voice, which can be transformed to say just about anything under the sun. Read More
Artificial intelligence that understands object relationships
A new machine-learning model could enable robots to understand interactions in the world in the way humans do.
MIT researchers have developed a machine learning model that understands the underlying relationships between objects in a scene and can generate accurate images of scenes from text descriptions. Read More
‘Paint Me a Picture’: NVIDIA Research Shows GauGAN AI Art Demo Now Responds to Words
GauGAN2 uses a deep learning model that turns a simple written phrase, or sentence, into a photorealistic masterpiece.
A picture worth a thousand words now takes just three or four words to create, thanks to GauGAN2, the latest version of NVIDIA Research’s wildly popular AI painting demo.
The deep learning model behind GauGAN allows anyone to channel their imagination into photorealistic masterpieces — and it’s easier than ever. Simply type a phrase like “sunset at a beach” and AI generates the scene in real time. Add an additional adjective like “sunset at a rocky beach,” or swap “sunset” to “afternoon” or “rainy day” and the model, based on generative adversarial networks, instantly modifies the picture.
With the press of a button, users can generate a segmentation map, a high-level outline that shows the location of objects in the scene. From there, they can switch to drawing, tweaking the scene with rough sketches using labels like sky, tree, rock and river, allowing the smart paintbrush to incorporate these doodles into stunning images. Read More