QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars

Real-time tracking of human body motion is crucial for interactive and immersive experiences in AR/VR. However, very limited sensor data about the body is available from standalone wearable devices such as HMDs (Head Mounted Devices) or AR glasses. In this work, we present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers, and simulates plausible and physically valid full body motions. Using high quality full body motion as dense supervision during training, a simple policy network can learn to output appropriate torques for the character to balance, walk, and jog, while closely following the input signals. Our results demonstrate surprisingly similar leg motions to ground truth without any observations of the lower body, even when the input is only the 6D transformations of the HMD. We also show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments. Read More

#image-recognition

Artist receives first known US copyright registration for latent diffusion AI art

Registration of AI-assisted comic comes amid fierce online debate about AI art ethics.

In what might be a first, a New York-based artist named Kris Kashtanova has received US copyright registration on their graphic novel that features AI-generated artwork created by latent diffusion AI, according to their Instagram feed and confirmed through a public records search by Ars Technica.

The registration, effective September 15, applies to a comic book called Zarya of the Dawn. Kashtanova created the artwork for Zarya using Midjourney, a commercial image synthesis service. Read More

#image-recognition, #nlp

Turn anyone into a pokémon with this AI art model

A fun little AI art widget named Text-to-Pokémon lets you plug in any name or description you like and (you guessed it) generate a pokémon matching your prompt.

The model’s output isn’t flawless, but it’s incredibly entertaining all the same. You can try punching in the names of celebrities or politicians (see “Boris Johnson” and “Vladimir Putin” in the image above) or just general descriptions of the sort of pokémon that would tickle your personal fancy (the one below is my “skeleton priest”). Read More

#image-recognition, #nlp

PP-Matting: High-Accuracy Natural Image Matting

Natural image matting is a fundamental and challenging computer vision task. It has many applications in image editing and composition. Recently, deep learning-based approaches have achieved great improvements in image matting. However, most of them require a user-supplied trimap as an auxiliary input, which limits the matting applications in the real world. Although some trimap-free approaches have been proposed, the matting quality is still unsatisfactory compared to trimap-based ones. Without the trimap guidance, the matting models suffer from foreground-background ambiguity easily, and also generate blurry details in the transition area. In this work, we propose PP-Matting, a trimap-free architecture that can achieve high-accuracy natural image matting. Our method applies a high-resolution detail branch (HRDB) that extracts fine-grained details of the foreground with keeping feature resolution unchanged. Also, we propose a semantic context branch (SCB) that adopts a semantic segmentation subtask. It prevents the detail prediction from local ambiguity caused by semantic context missing. In addition, we conduct extensive experiments on two well-known benchmarks: Composition-1k and Distinctions-646. The results demonstrate the superiority of PP-Matting over previous methods. Furthermore, we provide a qualitative evaluation of our method on human matting which shows its outstanding performance in the practical application. Read More

#image-recognition, #vfx

OpenAI begins allowing users to edit faces using DALL-E 2

After initially disabling the capability, OpenAI today announced that customers with access to DALL-E 2 can upload people’s faces to edit them using the AI-powered image-generating system. Previously, OpenAI only allowed users to work with and share photorealistic faces and banned the uploading of any photo that might depict a real person, including photos of prominent celebrities and public figures.

OpenAI claims that improvements to its safety system made the face-editing feature possible by “minimizing the potential of harm” from deepfakes as well as attempts to create sexual, political and violent content. In an email to customers, the company wrote:

Many of you have told us that you miss using DALL-E to dream up outfits and hairstyles on yourselves and edit the backgrounds of family photos. A reconstructive surgeon told us that he’d been using DALL-E to help his patients visualize results. And filmmakers have told us that they want to be able to edit images of scenes with people to help speed up their creative processes … [We] built new detection and response techniques to stop misuse.

Read More

#image-recognition, #nlp

D-ID, the company behind Deep Nostalgia, lets you create AI-generated videos from a single image

sraeli AI company D-ID, which provided technology for projects like Deep Nostalgia, is launching a new platform where users can upload a single image and text to generate video. With this new site called Creative Reality Studio, the company is targeting sectors like corporate training and education, internal and external communication from companies, product marketing and sales.

The platform is pretty simple to use: Users can upload an image of a presenter or select one from the pre-created presenters to start the video creation process. Paid users can access premium presenters who are more “expressive” as they have better facial expressions and hand movements than the default ones. After that, users can either type the text from a script or simply upload an audio clip of someone’s speech. Users can then select a language (the platform supports 119 languages), voice and styles like cheerful, sad, excited and friendly.

The company’s AI-based algorithms will generate a video based on these parameters. Users can then distribute the video anywhere. The firm claims that the algorithm takes only half of the video duration time to generate a clip, but in our tests, it took a couple of minutes to generate a one-minute video. This could change depending on the type of presenter and language you selected. Read More

#fake, #image-recognition

Midjourney AI Art VS Artist – Testing Ai art to see if it can replicate my artwork

Read More
#image-recognition, #nlp

Transframer: Arbitrary Frame Prediction with Generative Models

We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data Read More

#big7, #image-recognition

Upcoming AI image generator will run on an RTX 3080

An announcement from Stability.ai comes with some great news for anyone on the AI image generation hype. Stable Diffusion, an image generation software that uses consumer level hardware, will soon be going public.  Read More

#image-recognition

#nlp

You can (sort of) generate art like Dall-E with TikTok’s latest filter

If

you’re still on the waiting list to try out DALL-E and you just want a quick peek at the kind of technology that powers it, you might want to open up TikTok.

TikTok’s latest filter may have been around for a few days now, but we first noticed its new A.I. text-to-image generator filter on Sunday. It’s called AI Greenscreen, and it lets you generate painterly style images based on words you input. And the images you generate can become the background of your TikTok videos, like a green screen. Read More

#image-recognition, #vfx