Google’s text-to-image AI model Imagen is getting its first (very limited) public outing

Google is being extremely cautious with the release of its text-to-image AI systems. Although the company’s Imagen model produces output equal in quality to OpenAI’s DALL-E 2 or Stability AI’s Stable Diffusion, Google hasn’t made the system available to the public.

Today, though, the search giant announced it will be adding Imagen — in a very limited form — to its AI Test Kitchen app as a way to collect early feedback on the technology.

AI Test Kitchen was launched earlier this year as a way for Google to beta test various AI systems. Currently, the app offers a few different ways to interact with Google’s text model LaMDA (yes, the same one that the engineer thought was sentient), and the company will soon be adding similarly constrained Imagen requests as part of what it calls a “season two” update to the app. In short, there’ll be two ways to interact with Imagen, which Google demoed to The Verge ahead of the announcement today: “City Dreamer” and “Wobble.” Read More

#image-recognition

Prompt Engineering: Future of AI or Hack?

Is prompt engineering — the art of writing text prompts to get an AI system to generate the output you want — going to be a dominant user interface for AI? With the rise of text generators such as GPT-3 and Jurassic and image generators such as DALL·E, Midjourney, and Stable Diffusion, which take text input and produce output to match, there has been growing interest in how to craft prompts to get the output you want. For example, when generating an image of a panda, how does adding an adjective such as “beautiful” or a phrase like “trending on artstation” influence the output? The response to a particular prompt can be hard to predict and varies from system to system. Read More

#image-recognition, #nlp

People Can’t Stop Feeding Their Selfies into a Super Mean AI

THIS NEURAL NETWORK HAS A REALLY BAD ATTITUDE.

There’s a hot new AI on the block, but instead of generating images, this one analyzes them and spits out crude roasts of anyone they depict.

The AI, known as the CLIP Interrogator and created by a generative artist who goes by the handle Pharmapsychotic, is technically a tool to figure out “what a good prompt might be to create new images like an existing one.” Read More

#image-recognition, #nlp

Adobe’s latest AI prototype gives even the worst dancers some impressive moves

Project Motion Mix converts a still photograph into a dancing animation using machine learning

Adobe will reveal a prototype AI project later today at Adobe Max 2022 that can convert a still image of a person into an animated dancer. Adobe says that all you need to do is load a full-body picture into Project Motion Mix, and the system will turn that individual into an AI-controlled puppet, animating new dance moves.

The system uses a combination of AI-based motion generation and what Adobe is calling “human rendering technologies” to create its animations. The software lets users select from different dance styles, tweak the background, and add multiple dancers into one frame. However, it’s still just a prototype, and Adobe says it isn’t sure if or when the system might be added to its user-facing services. Read More

#image-recognition

PHENAKI: Variable Length Video Generation from Open Domain Textual Descriptions

We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or a story) in open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. In addition, compared to the per- frame baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency. Read More

#image-recognition, #nlp

AI-generated imagery is the new clip art as Microsoft adds DALL-E to its Office suite

Microsoft is adding AI-generated art to its suite of Office software with a new app named Microsoft Designer.

The app functions the same way as AI text-to-image models like DALL-E and Stable Diffusion, letting users type prompts to “instantly generate a variety of designs with minimal effort.” Microsoft says Designer can be used to create everything from greeting cards and social media posts to illustrations for PowerPoint presentations and logos for businesses.

Essentially, AI-generated imagery looks set to become the new clip art. Read More

#big7, #image-recognition, #nlp

The Illustrated Stable Diffusion

AI image generation is the most recent AI capability blowing people’s minds (mine included). The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performance model available to the masses (performance in terms of image quality, as well as speed and relatively low resource/memory requirements).

After experimenting with AI image generation, you may start to wonder how it works.

This is a gentle introduction to how Stable Diffusion works. Read More

#image-recognition

Generative AI: A Creative New World

Humans are good at analyzing things. Machines are even better. Machines can analyze a set of data and find patterns in it for a multitude of use cases, whether it’s fraud or spam detection, forecasting the ETA of your delivery or predicting which TikTok video to show you next. They are getting smarter at these tasks. This is called “Analytical AI,” or traditional AI. 

But humans are not only good at analyzing things—we are also good at creating. We write poetry, design products, make games and crank out code. Up until recently, machines had no chance of competing with humans at creative work—they were relegated to analysis and rote cognitive labor. But machines are just starting to get good at creating sensical and beautiful things. This new category is called “Generative AI,” meaning the machine is generating something new rather than analyzing something that already exists. 

Generative AI is well on the way to becoming not just faster and cheaper, but better in some cases than what humans create by hand. Every industry that requires humans to create original work—from social media to gaming, advertising to architecture, coding to graphic design, product design to law, marketing to sales—is up for reinvention. Certain functions may be completely replaced by generative AI, while others are more likely to thrive from a tight iterative creative cycle between human and machine—but generative AI should unlock better, faster and cheaper creation across a wide range of end markets. The dream is that generative AI brings the marginal cost of creation and knowledge work down towards zero, generating vast labor productivity and economic value—and commensurate market cap. Read More

#image-recognition, #nlp, #strategy

AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability

Yesterday, Meta’s AI Research Team announced Make-A-Video, a “state-of-the-art AI system that generates videos from text.”

Like he did for the Stable Diffusion data, Simon Willison created a Datasette browser to explore WebVid-10M, one of the two datasets used to train the video generation model, and quickly learned that all 10.7 million video clips were scraped from Shutterstock, watermarks and all.

XIn addition to the Shutterstock clips, Meta also used 10 million video clips from this 100M video dataset from Microsoft Research Asia. It’s not mentioned on their GitHub, but if you dig into the paper, you learn that every clip came from over 3 million YouTube videos.

So, in addition to a massive chunk of Shutterstock’s video collection, Meta is also using millions of YouTube videos collected by Microsoft to make its text-to-video AI. Read More

#ethics, #image-recognition, #nlp

Cryogeomorphic Characterization of Shadowed Regions in the Artemis Exploration Zone

The Artemis program will send crew to explore the south polar region of the Moon, preceded by and integrated with robotic missions. One of the main scientific goals of future exploration is the characterization of polar volatiles, which are concentrated in and near regions of permanent shadow. The meter-scale cryogeomorphology of shadowed regions remains unknown, posing a potential risk to missions that plan to traverse or land in them. Here, we deploy a physics-based, deep learning-driven post-processing tool to produce high-signal and high-resolution Lunar Reconnaissance Orbiter Narrow Angle Camera images of 44 shadowed regions larger than ∼40 m across in the Artemis exploration zone around potential landing sites 001 and 004. We use these images to map previously unknown, shadowed meter-scale (cryo)geomorphic features, assign relative shadowed region ages, and recommend promising sites for future exploration. We freely release our data and a detailed catalog of all shadowed regions studied. Read More

#image-recognition