THIS NEURAL NETWORK HAS A REALLY BAD ATTITUDE.
There’s a hot new AI on the block, but instead of generating images, this one analyzes them and spits out crude roasts of anyone they depict.
The AI, known as the CLIP Interrogator and created by a generative artist who goes by the handle Pharmapsychotic, is technically a tool to figure out “what a good prompt might be to create new images like an existing one.” Read More
Tag Archives: Image Recognition
Adobe’s latest AI prototype gives even the worst dancers some impressive moves
Project Motion Mix converts a still photograph into a dancing animation using machine learning
Adobe will reveal a prototype AI project later today at Adobe Max 2022 that can convert a still image of a person into an animated dancer. Adobe says that all you need to do is load a full-body picture into Project Motion Mix, and the system will turn that individual into an AI-controlled puppet, animating new dance moves.
The system uses a combination of AI-based motion generation and what Adobe is calling “human rendering technologies” to create its animations. The software lets users select from different dance styles, tweak the background, and add multiple dancers into one frame. However, it’s still just a prototype, and Adobe says it isn’t sure if or when the system might be added to its user-facing services. Read More
PHENAKI: Variable Length Video Generation from Open Domain Textual Descriptions
We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or a story) in open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. In addition, compared to the per- frame baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency. Read More
AI-generated imagery is the new clip art as Microsoft adds DALL-E to its Office suite
Microsoft is adding AI-generated art to its suite of Office software with a new app named Microsoft Designer.
The app functions the same way as AI text-to-image models like DALL-E and Stable Diffusion, letting users type prompts to “instantly generate a variety of designs with minimal effort.” Microsoft says Designer can be used to create everything from greeting cards and social media posts to illustrations for PowerPoint presentations and logos for businesses.
Essentially, AI-generated imagery looks set to become the new clip art. Read More
The Illustrated Stable Diffusion
AI image generation is the most recent AI capability blowing people’s minds (mine included). The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performance model available to the masses (performance in terms of image quality, as well as speed and relatively low resource/memory requirements).
After experimenting with AI image generation, you may start to wonder how it works.
This is a gentle introduction to how Stable Diffusion works. Read More
Generative AI: A Creative New World
Humans are good at analyzing things. Machines are even better. Machines can analyze a set of data and find patterns in it for a multitude of use cases, whether it’s fraud or spam detection, forecasting the ETA of your delivery or predicting which TikTok video to show you next. They are getting smarter at these tasks. This is called “Analytical AI,” or traditional AI.
But humans are not only good at analyzing things—we are also good at creating. We write poetry, design products, make games and crank out code. Up until recently, machines had no chance of competing with humans at creative work—they were relegated to analysis and rote cognitive labor. But machines are just starting to get good at creating sensical and beautiful things. This new category is called “Generative AI,” meaning the machine is generating something new rather than analyzing something that already exists.
Generative AI is well on the way to becoming not just faster and cheaper, but better in some cases than what humans create by hand. Every industry that requires humans to create original work—from social media to gaming, advertising to architecture, coding to graphic design, product design to law, marketing to sales—is up for reinvention. Certain functions may be completely replaced by generative AI, while others are more likely to thrive from a tight iterative creative cycle between human and machine—but generative AI should unlock better, faster and cheaper creation across a wide range of end markets. The dream is that generative AI brings the marginal cost of creation and knowledge work down towards zero, generating vast labor productivity and economic value—and commensurate market cap. Read More
AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability
Yesterday, Meta’s AI Research Team announced Make-A-Video, a “state-of-the-art AI system that generates videos from text.”
Like he did for the Stable Diffusion data, Simon Willison created a Datasette browser to explore WebVid-10M, one of the two datasets used to train the video generation model, and quickly learned that all 10.7 million video clips were scraped from Shutterstock, watermarks and all.
XIn addition to the Shutterstock clips, Meta also used 10 million video clips from this 100M video dataset from Microsoft Research Asia. It’s not mentioned on their GitHub, but if you dig into the paper, you learn that every clip came from over 3 million YouTube videos.
So, in addition to a massive chunk of Shutterstock’s video collection, Meta is also using millions of YouTube videos collected by Microsoft to make its text-to-video AI. Read More
Cryogeomorphic Characterization of Shadowed Regions in the Artemis Exploration Zone
The Artemis program will send crew to explore the south polar region of the Moon, preceded by and integrated with robotic missions. One of the main scientific goals of future exploration is the characterization of polar volatiles, which are concentrated in and near regions of permanent shadow. The meter-scale cryogeomorphology of shadowed regions remains unknown, posing a potential risk to missions that plan to traverse or land in them. Here, we deploy a physics-based, deep learning-driven post-processing tool to produce high-signal and high-resolution Lunar Reconnaissance Orbiter Narrow Angle Camera images of 44 shadowed regions larger than ∼40 m across in the Artemis exploration zone around potential landing sites 001 and 004. We use these images to map previously unknown, shadowed meter-scale (cryo)geomorphic features, assign relative shadowed region ages, and recommend promising sites for future exploration. We freely release our data and a detailed catalog of all shadowed regions studied. Read More
Google’s newest AI generator creates HD video from text prompts
Not to be outdone by Meta, Google’s AI generator can output 1280×768 HD video at 24 fps.
Today, Google announced the development of Imagen Video, a text-to-video AI mode capable of producing 1280×768 videos at 24 frames per second from a written prompt. Currently, it’s in a research phase, but its appearance five months after Google Imagen points to the rapid development of video synthesis models.
Only six months after the launch of OpenAI’s DALLE-2 text-to-image generator, progress in the field of AI diffusion models has been heating up rapidly. Google’s Imagen Video announcement comes less than a week after Meta unveiled its text-to-video AI tool, Make-A-Video.
According to Google’s research paper, Imagen Video includes several notable stylistic abilities, such as generating videos based on the work of famous painters (the paintings of Vincent van Gogh, for example), generating 3D rotating objects while preserving object structure, and rendering text in a variety of animation styles. Google is hopeful that general-purpose video synthesis models can “significantly decrease the difficulty of high-quality content generation.” Read More
META Introduces Make-A-Video: An AI system that generates videos from text
Today, we’re announcing Make-A-Video, a new AI system that lets people turn text prompts into brief, high-quality video clips. Make-A-Video builds on Meta AI’s recent progress in generative technology research and has the potential to open new opportunities for creators and artists. The system learns what the world looks like from paired text-image data and how the world moves from video footage with no associated text. As part of our continued commitment to open science, we’re sharing details in a research paper and plan to release a demo experience. Read More