Seoul Robotics Announces LiDAR Enabled Autonomous Logistics Platform

The task of transporting cars from the end of the assembly line to its final destination is currently a manual and expensive logistics problem. It includes loading and unloading of vehicles from the factory floor to trucks, ships and rail, with interim stops at parking lots. Seoul Robotics aims to change this. The company has just launched Level 5 Control Tower (LV5 CTRL TWR) system which BMW is leveraging to automate last-mile fleet logistics at their manufacturing facility in Munich.

The system uses SENSR™, a proprietary perception software powered by artificial intelligence (AI) algorithms. SENSR™ works in conjunction with a mesh network of computers and LiDAR sensors located on fixed infrastructure (light poles, roof overhangs, etc) that guides vehicles autonomously through a 5G communications network. Read More

#image-recognition, #robotics

OpenAI’s GLIDE Overtakes DALL-E

In a field with constant evolution, artificial intelligence news is starting to take a bigger share of my attention bandwidth. I’m really into breaking news.

OpenAI researchers this week presented GLIDE (Guided Language-to-Image Diffusion for Generation and Editing), a diffusion model that achieves performance competitive with DALL-E while using less than one-third of the parameters.

Text-to-image generation has been one of the most active and exciting AI fields of 2021. In January, OpenAI introduced DALL-E, a 12-billion parameter version of the company’s GPT-3 transformer language model designed to generate photorealistic images using text captions as prompts.

The GitHub of Glide went live on December 22nd, 2021. Sometimes breaking news in AI is actually worth talking about and I consider this such an occasion. Read More

#image-recognition

Microsoft’s AI Understands Humans…But It Had Never Seen One!

Read More
#fake, #image-recognition, #videos

AI-generated characters for supporting personalized learning and well-being

Advancements in machine learning have recently enabled the hyper-realistic synthesis of prose, images, audio and video data, in what is referred to as artificial intelligence (AI)-generated media. These techniques offer novel opportunities for creating interactions with digital portrayals of individuals that can inspire and intrigue us. AI-generated portrayals of characters can feature synthesized faces, bodies and voices of anyone, from a fictional character to a historical figure, or even a deceased family member. Although negative use cases of this technology have dominated the conversation so far, in this Perspective we highlight emerging positive use cases of AI-generated characters, specifically in supporting learning and well-being. We demonstrate an easy-to-use AI character generation pipeline to enable such outcomes and discuss ethical implications as well as the need for including traceability to help maintain trust in the generated media. As we look towards the future, we foresee generative media as a crucial part of the ever growing landscape of human–AI interaction. Read More

#image-recognition

DeepRoute.ai Offers a Production-Ready L4 Autonomous Driving System at a Cool $10,000

Read More

Autonomous driving is considered to be the holy grail of the automotive industry and has been promised to us for quite a long time already. If I recall the slides from a 2013 Bosch presentation, we should’ve been all passengers in our cars a year ago. Back then, seven years seemed like a reasonable time frame but, health crisis aside, we are nowhere near fully-autonomous driving, or Level 5 (L5) autonomy as the industry calls it.

Sure, Tesla calls its assistance suite “Autopilot” or even “Full Self-Driving,” but it’s just a deceptive trade name for a system that is only capable of L2 autonomy. This means that the car cannot be trusted with your life and Tesla does not assume responsibility for whatever mischiefs the car might be doing. Read More

#image-recognition, #robotics, #videos

Synthesia raises $50M to leverage synthetic avatars for corporate training and more

Because every doc should be a presentation, and every presentation should be a video?

Synthesia, a startup using AI to create synthetic videos, is walking a fine, but thus far prosperous, line between being creepy and being pretty freakin’ cool.

…Synthesia allows anyone to turn text or a slide deck presentation into a video, complete with a talking avatar. Customers can leverage existing avatars, created from the performance of actors, or create their own in minutes by uploading some video. Users also can upload a recording of their voice, which can be transformed to say just about anything under the sun. Read More

#image-recognition, #vfx

Artificial intelligence that understands object relationships

A new machine-learning model could enable robots to understand interactions in the world in the way humans do.

MIT researchers have developed a machine learning model that understands the underlying relationships between objects in a scene and can generate accurate images of scenes from text descriptions.    Read More

#image-recognition

‘Paint Me a Picture’: NVIDIA Research Shows GauGAN AI Art Demo Now Responds to Words

GauGAN2 uses a deep learning model that turns a simple written phrase, or sentence, into a photorealistic masterpiece.

A picture worth a thousand words now takes just three or four words to create, thanks to GauGAN2, the latest version of NVIDIA Research’s wildly popular AI painting demo.

The deep learning model behind GauGAN allows anyone to channel their imagination into photorealistic masterpieces — and it’s easier than ever. Simply type a phrase like “sunset at a beach” and AI generates the scene in real time. Add an additional adjective like “sunset at a rocky beach,” or swap “sunset” to “afternoon” or “rainy day” and the model, based on generative adversarial networks, instantly modifies the picture.

With the press of a button, users can generate a segmentation map, a high-level outline that shows the location of objects in the scene. From there, they can switch to drawing, tweaking the scene with rough sketches using labels like sky, tree, rock and river, allowing the smart paintbrush to incorporate these doodles into stunning images. Read More

#gans, #image-recognition, #nvidia

Face Recognition Vendor Test (FRVT) Ongoing

In cooperation with IARPA, National Institute of Standards and Technology (NIST) is currently running three challenges related to processing of unconstrained in-the-wild face images.  The Face Recognition Vendor Test (FRVT) is an ongoing evaluation of face recognition algorithms applied to large image databases sequestered at NIST.  Algorithms may be submitted to NIST at any time, and results will be posted when ready, usually within two weeks. Homepage

#image-recognition

Unsupervised Learning of Visual 3D Keypoints for Control

Learning sensorimotor control policies from high dimensional images crucially relies on the quality of the underlying visual representations. Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control. However, most of these representations, whether structured or unstructured are learned in a 2D space even though the control tasks are usually performed in a 3D environment. In this work, we propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner. The input images are embedded into latent 3D keypoints via a differentiable encoder which is trained to optimize both a multi-view consistency loss and downstream task objective. These discovered D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space. The proposed approach outperforms prior state-of-art methods across a variety of reinforcement learning benchmarks. Read More

#image-recognition, #robotics