Kling, the AI video generator rival to Sora that’s wowing creators

If you follow any AI influencers or creators on social media, there’s a good chance you may have seen them more excited than usual lately about a new AI video generation model called “Kling.”

The videos it generates from pure text prompts and some configurable, in-app buttons and settings, look incredibly realistic, on par with OpenAI’s still non-public, invitation only, closed beta AI model Sora, which it has shared with a small group of artists and filmmakers as it tests it and its adversarial (read: risky, objectionable) uses.

[W]here did Kling come from? What does it offer? And how can you get your hands on it? Read on to find out. — Read More

#china-ai, #image-recognition

UKRAINE IS RIDDLED WITH LAND MINES. DRONES AND AI CAN HELP

EARLY ON A JUNE morning in 2023, my colleagues and I drove down a bumpy dirt road north of Kyiv in Ukraine. The Ukrainian Armed Forces were conducting training exercises nearby, and mortar shells arced through the sky. We arrived at a vast field for a technology demonstration set up by the United Nations. Across the 25-hectare field—that’s about the size of 62 American football fields—the U.N. workers had scattered 50 to 100 inert mines and other ordnance. Our task was to fly our drone over the area and use our machine learning software to detect as many as possible. And we had to turn in our results within 72 hours.

The scale was daunting: The area was 10 times as large as anything we’d attempted before with our drone demining startup, Safe Pro AI. My cofounder Gabriel Steinberg and I used flight-planning software to program a drone to cover the whole area with some overlap, taking photographs the whole time. It ended up taking the drone 5 hours to complete its task, and it came away with more than 15,000 images. Then we raced back to the hotel with the data it had collected and began an all-night coding session.

We were happy to see that our custom machine learning model took only about 2 hours to crunch through all the visual data and identify potential mines and ordnance. But constructing a map for the full area that included the specific coordinates of all the detected mines in under 72 hours was simply not possible with any reasonable computational resources. The following day (which happened to coincide with the short-lived Wagner Group rebellion), we rewrote our algorithms so that our system mapped only the locations where suspected land mines were identified—a more scalable solution for our future work. — Read More

#dod, #image-recognition

Microsoft AI creates scary real talkie videos from a single photo

Microsoft Research Asia has revealed an AI model that can generate frighteningly realistic deepfake videos from a single still image and an audio track. How will we be able to trust what we see and hear online from here on in?

… After training the [VASA-1] model on footage of around 6,000 real-life talking faces from the VoxCeleb2 dataset, the technology is able to generate scary real video where the newly animated subject is not only able to accurately lip-sync to a supplied voice audio track, but also sports varied facial expressions and natural head movements – all from a single static headshot photo. — Read More

#image-recognition

Stability AI Announces Stable Diffusion 3: All We Know So Far

Stability AI announced an early preview of Stable Diffusion 3, their text-to-image generative AI model. Unlike last week’s Sora text-to-video announcement from OpenAI, there were limited demonstrations of the model’s new capabilities, but some details were provided. Here, we explore what the announcement means, how the new model works, and some implications for the advancement of image generation. — Read More

#image-recognition

Latte: Latent Diffusion Transformer for Video Generation

We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation. — Read More

#nlp, #image-recognition

OpenAI introduces Sora, its text-to-video AI model

OpenAI’s latest model takes text prompts and turns them into ‘complex scenes with multiple characters, specific types of motion,’ and more.

OpenAI is launching a new video-generation model, and it’s called Sora. The AI company says Sora “can create realistic and imaginative scenes from text instructions.” The text-to-video model allows users to create photorealistic videos up to a minute long — all based on prompts they’ve written.

Sora is capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” according to OpenAI’s introductory blog post. The company also notes that the model can understand how objects “exist in the physical world,” as well as “accurately interpret props and generate compelling characters that express vibrant emotions.” — Read More

#chatbots, #image-recognition

Keyframer: Empowering Animation Design using Large Language Models

Large language models (LLMs) have the potential to impact a wide range of creative domains, but the application of LLMs to animation is underexplored and presents novel challenges such as how users might effectively describe motion in natural language. In this paper, we present Keyframer, a design tool for animating static images (SVGs) with natural language. Informed by interviews with professional animation designers and engineers, Keyframer supports exploration and refinement of animations through the combination of prompting and direct editing of generated output. The system also enables users to request design variants, supporting comparison and ideation. Through a user study with 13 participants, we contribute a characterization of user prompting strategies, including a taxonomy of semantic prompt types for describing motion and a ‘decomposed’ prompting style where users continually adapt their goals in response to generated output.We share how direct editing along with prompting enables iteration beyond one-shot prompting interfaces common in generative tools today. Through this work, we propose how LLMs might empower a range of audiences to engage with animation creation.  – Read More

#image-recognition

OpenAI joins Meta in labeling AI generated images

Not to be outdone by a rival, OpenAI today announced it is updating its marquee app ChatGPT and the AI image generator model integrated within it, DALL-E 3, to include new metadata tagging that will allow the company, and theoretically any user or other organization across the web, to identify the imagery as having been made with AI tools.

The move came just hours after Meta announced a similar measure to label AI images generated through its separate AI image generator Imagine and available on Instagram, Facebook, and Threads (and, also, trained on user-submitted imagery from some of those social platforms).  – Read More

#fake, #image-recognition

AI helps scholars read scroll buried when Vesuvius erupted in AD79

Researchers used AI to read letters on papyrus scroll damaged by the blast of heat, ash and pumice that destroyed Pompeii.

Scholars of antiquity believe they are on the brink of a new era of understanding after researchers armed with artificial intelligence read the hidden text of a charred scroll that was buried when Mount Vesuvius erupted nearly 2,000 years ago.  – Read More

#image-recognition

Google’s latest AI video generator can render cute animals in implausible situations

On Tuesday, Google announced Lumiere, an AI video generator that it calls “a space-time diffusion model for realistic video generation” in the accompanying preprint paper. But let’s not kid ourselves: It does a great job of creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.  – Read More

#image-recognition