Humans are good at analyzing things. Machines are even better. Machines can analyze a set of data and find patterns in it for a multitude of use cases, whether it’s fraud or spam detection, forecasting the ETA of your delivery or predicting which TikTok video to show you next. They are getting smarter at these tasks. This is called “Analytical AI,” or traditional AI.
But humans are not only good at analyzing things—we are also good at creating. We write poetry, design products, make games and crank out code. Up until recently, machines had no chance of competing with humans at creative work—they were relegated to analysis and rote cognitive labor. But machines are just starting to get good at creating sensical and beautiful things. This new category is called “Generative AI,” meaning the machine is generating something new rather than analyzing something that already exists.
Generative AI is well on the way to becoming not just faster and cheaper, but better in some cases than what humans create by hand. Every industry that requires humans to create original work—from social media to gaming, advertising to architecture, coding to graphic design, product design to law, marketing to sales—is up for reinvention. Certain functions may be completely replaced by generative AI, while others are more likely to thrive from a tight iterative creative cycle between human and machine—but generative AI should unlock better, faster and cheaper creation across a wide range of end markets. The dream is that generative AI brings the marginal cost of creation and knowledge work down towards zero, generating vast labor productivity and economic value—and commensurate market cap. Read More
Tag Archives: Image Recognition
AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability
Yesterday, Meta’s AI Research Team announced Make-A-Video, a “state-of-the-art AI system that generates videos from text.”
Like he did for the Stable Diffusion data, Simon Willison created a Datasette browser to explore WebVid-10M, one of the two datasets used to train the video generation model, and quickly learned that all 10.7 million video clips were scraped from Shutterstock, watermarks and all.
XIn addition to the Shutterstock clips, Meta also used 10 million video clips from this 100M video dataset from Microsoft Research Asia. It’s not mentioned on their GitHub, but if you dig into the paper, you learn that every clip came from over 3 million YouTube videos.
So, in addition to a massive chunk of Shutterstock’s video collection, Meta is also using millions of YouTube videos collected by Microsoft to make its text-to-video AI. Read More
Cryogeomorphic Characterization of Shadowed Regions in the Artemis Exploration Zone
The Artemis program will send crew to explore the south polar region of the Moon, preceded by and integrated with robotic missions. One of the main scientific goals of future exploration is the characterization of polar volatiles, which are concentrated in and near regions of permanent shadow. The meter-scale cryogeomorphology of shadowed regions remains unknown, posing a potential risk to missions that plan to traverse or land in them. Here, we deploy a physics-based, deep learning-driven post-processing tool to produce high-signal and high-resolution Lunar Reconnaissance Orbiter Narrow Angle Camera images of 44 shadowed regions larger than ∼40 m across in the Artemis exploration zone around potential landing sites 001 and 004. We use these images to map previously unknown, shadowed meter-scale (cryo)geomorphic features, assign relative shadowed region ages, and recommend promising sites for future exploration. We freely release our data and a detailed catalog of all shadowed regions studied. Read More
Google’s newest AI generator creates HD video from text prompts
Not to be outdone by Meta, Google’s AI generator can output 1280×768 HD video at 24 fps.
Today, Google announced the development of Imagen Video, a text-to-video AI mode capable of producing 1280×768 videos at 24 frames per second from a written prompt. Currently, it’s in a research phase, but its appearance five months after Google Imagen points to the rapid development of video synthesis models.
Only six months after the launch of OpenAI’s DALLE-2 text-to-image generator, progress in the field of AI diffusion models has been heating up rapidly. Google’s Imagen Video announcement comes less than a week after Meta unveiled its text-to-video AI tool, Make-A-Video.
According to Google’s research paper, Imagen Video includes several notable stylistic abilities, such as generating videos based on the work of famous painters (the paintings of Vincent van Gogh, for example), generating 3D rotating objects while preserving object structure, and rendering text in a variety of animation styles. Google is hopeful that general-purpose video synthesis models can “significantly decrease the difficulty of high-quality content generation.” Read More
META Introduces Make-A-Video: An AI system that generates videos from text
Today, we’re announcing Make-A-Video, a new AI system that lets people turn text prompts into brief, high-quality video clips. Make-A-Video builds on Meta AI’s recent progress in generative technology research and has the potential to open new opportunities for creators and artists. The system learns what the world looks like from paired text-image data and how the world moves from video footage with no associated text. As part of our continued commitment to open science, we’re sharing details in a research paper and plan to release a demo experience. Read More
QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars
Real-time tracking of human body motion is crucial for interactive and immersive experiences in AR/VR. However, very limited sensor data about the body is available from standalone wearable devices such as HMDs (Head Mounted Devices) or AR glasses. In this work, we present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers, and simulates plausible and physically valid full body motions. Using high quality full body motion as dense supervision during training, a simple policy network can learn to output appropriate torques for the character to balance, walk, and jog, while closely following the input signals. Our results demonstrate surprisingly similar leg motions to ground truth without any observations of the lower body, even when the input is only the 6D transformations of the HMD. We also show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments. Read More
#image-recognitionArtist receives first known US copyright registration for latent diffusion AI art
Registration of AI-assisted comic comes amid fierce online debate about AI art ethics.
In what might be a first, a New York-based artist named Kris Kashtanova has received US copyright registration on their graphic novel that features AI-generated artwork created by latent diffusion AI, according to their Instagram feed and confirmed through a public records search by Ars Technica.
The registration, effective September 15, applies to a comic book called Zarya of the Dawn. Kashtanova created the artwork for Zarya using Midjourney, a commercial image synthesis service. Read More
Turn anyone into a pokémon with this AI art model
A fun little AI art widget named Text-to-Pokémon lets you plug in any name or description you like and (you guessed it) generate a pokémon matching your prompt.
The model’s output isn’t flawless, but it’s incredibly entertaining all the same. You can try punching in the names of celebrities or politicians (see “Boris Johnson” and “Vladimir Putin” in the image above) or just general descriptions of the sort of pokémon that would tickle your personal fancy (the one below is my “skeleton priest”). Read More
PP-Matting: High-Accuracy Natural Image Matting
Natural image matting is a fundamental and challenging computer vision task. It has many applications in image editing and composition. Recently, deep learning-based approaches have achieved great improvements in image matting. However, most of them require a user-supplied trimap as an auxiliary input, which limits the matting applications in the real world. Although some trimap-free approaches have been proposed, the matting quality is still unsatisfactory compared to trimap-based ones. Without the trimap guidance, the matting models suffer from foreground-background ambiguity easily, and also generate blurry details in the transition area. In this work, we propose PP-Matting, a trimap-free architecture that can achieve high-accuracy natural image matting. Our method applies a high-resolution detail branch (HRDB) that extracts fine-grained details of the foreground with keeping feature resolution unchanged. Also, we propose a semantic context branch (SCB) that adopts a semantic segmentation subtask. It prevents the detail prediction from local ambiguity caused by semantic context missing. In addition, we conduct extensive experiments on two well-known benchmarks: Composition-1k and Distinctions-646. The results demonstrate the superiority of PP-Matting over previous methods. Furthermore, we provide a qualitative evaluation of our method on human matting which shows its outstanding performance in the practical application. Read More
OpenAI begins allowing users to edit faces using DALL-E 2
After initially disabling the capability, OpenAI today announced that customers with access to DALL-E 2 can upload people’s faces to edit them using the AI-powered image-generating system. Previously, OpenAI only allowed users to work with and share photorealistic faces and banned the uploading of any photo that might depict a real person, including photos of prominent celebrities and public figures.
OpenAI claims that improvements to its safety system made the face-editing feature possible by “minimizing the potential of harm” from deepfakes as well as attempts to create sexual, political and violent content. In an email to customers, the company wrote:
Read MoreMany of you have told us that you miss using DALL-E to dream up outfits and hairstyles on yourselves and edit the backgrounds of family photos. A reconstructive surgeon told us that he’d been using DALL-E to help his patients visualize results. And filmmakers have told us that they want to be able to edit images of scenes with people to help speed up their creative processes … [We] built new detection and response techniques to stop misuse.
#image-recognition, #nlp