When AI can make art – what does it mean for creativity?

When the concept artist and illustrator RJ Palmer first witnessed the fine-tuned photorealism of compositions produced by the AI image generator Dall-E 2, his feeling was one of unease. The tool, released by the AI research company OpenAI, showed a marked improvement on 2021’s Dall-E, and was quickly followed by rivals such as Stable Diffusion and Midjourney. Type in any surreal prompt, from Kermit the frog in the style of Edvard Munch, to Gollum from The Lord of the Rings feasting on a slice of watermelon, and these tools will return a startlingly accurate depiction moments later.

The internet revelled in the meme-making opportunities, with a Twitter account documenting “weird Dall-E generations” racking up more than a million followers. Cosmopolitan trumpeted the world’s first AI-generated magazine cover, and technology investors fell over themselves to wave in the new era of “generative AI”. The image-generation capabilities have already spread to video, with the release of Google’s Imagen Video and Meta’s Make-A-Video.

But AI’s new artistic prowess wasn’t received so ecstatically by some creatives. “The main concern for me is what this does to the future of not just my industry, but creative human industries in general,” says Palmer. Read More

#image-recognition, #vfx

AI Drew This Gorgeous Comic Series, But You’d Never Know It

The Bestiary Chronicles is both a modern fable on the rise of artificial intelligence and a demonstration of how shockingly fast AI is evolving.

You might expect a comic book series featuring art generated entirely by artificial intelligence technology to be full of surreal images that have you tilting your head trying to grasp what kind of sense-shifting madness you’re looking at.

Not so with the images in The Bestiary Chronicles, a free, three-part comics series from Campfire Entertainment, an award-winning New York-based production house focused on creative storytelling.  Read More

#image-recognition, #vfx, #nlp

AI, Artists, and the Future of Images

An Introduction to Vilém Flusser, and thoughts on on AI art

In recent months, AI text-to-image artworks have been flooding the internet, releasing a deluge of discourse around the role of artists in a rapidly changing world. I’ve recently been revisiting the 1985 book Into the Universe of Technical Images by Vilém Flusser, a philosopher and media theorist who I first encountered in the context of film, but turns out to be shockingly relevant to the current wave of AI image models and the questions they raise about creativity, art, and labor. A close reading of Flusser’s prophetic text can help to answer some of these questions, and to clarify the role of artists in the fast-approaching future.

In broad strokes, Flusser’s account of cultural history can be summed up:

From traditional images (2D art images made by hand, like cave paintings),

to linear texts (i.e. written language works, like the Bible),

to technical images (images created mechanically by an apparatus, like a photograph). Read More

#image-recognition

Google’s text-to-image AI model Imagen is getting its first (very limited) public outing

Google is being extremely cautious with the release of its text-to-image AI systems. Although the company’s Imagen model produces output equal in quality to OpenAI’s DALL-E 2 or Stability AI’s Stable Diffusion, Google hasn’t made the system available to the public.

Today, though, the search giant announced it will be adding Imagen — in a very limited form — to its AI Test Kitchen app as a way to collect early feedback on the technology.

AI Test Kitchen was launched earlier this year as a way for Google to beta test various AI systems. Currently, the app offers a few different ways to interact with Google’s text model LaMDA (yes, the same one that the engineer thought was sentient), and the company will soon be adding similarly constrained Imagen requests as part of what it calls a “season two” update to the app. In short, there’ll be two ways to interact with Imagen, which Google demoed to The Verge ahead of the announcement today: “City Dreamer” and “Wobble.” Read More

#image-recognition

Prompt Engineering: Future of AI or Hack?

Is prompt engineering — the art of writing text prompts to get an AI system to generate the output you want — going to be a dominant user interface for AI? With the rise of text generators such as GPT-3 and Jurassic and image generators such as DALL·E, Midjourney, and Stable Diffusion, which take text input and produce output to match, there has been growing interest in how to craft prompts to get the output you want. For example, when generating an image of a panda, how does adding an adjective such as “beautiful” or a phrase like “trending on artstation” influence the output? The response to a particular prompt can be hard to predict and varies from system to system. Read More

#image-recognition, #nlp

People Can’t Stop Feeding Their Selfies into a Super Mean AI

THIS NEURAL NETWORK HAS A REALLY BAD ATTITUDE.

There’s a hot new AI on the block, but instead of generating images, this one analyzes them and spits out crude roasts of anyone they depict.

The AI, known as the CLIP Interrogator and created by a generative artist who goes by the handle Pharmapsychotic, is technically a tool to figure out “what a good prompt might be to create new images like an existing one.” Read More

#image-recognition, #nlp

Adobe’s latest AI prototype gives even the worst dancers some impressive moves

Project Motion Mix converts a still photograph into a dancing animation using machine learning

Adobe will reveal a prototype AI project later today at Adobe Max 2022 that can convert a still image of a person into an animated dancer. Adobe says that all you need to do is load a full-body picture into Project Motion Mix, and the system will turn that individual into an AI-controlled puppet, animating new dance moves.

The system uses a combination of AI-based motion generation and what Adobe is calling “human rendering technologies” to create its animations. The software lets users select from different dance styles, tweak the background, and add multiple dancers into one frame. However, it’s still just a prototype, and Adobe says it isn’t sure if or when the system might be added to its user-facing services. Read More

#image-recognition

PHENAKI: Variable Length Video Generation from Open Domain Textual Descriptions

We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or a story) in open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. In addition, compared to the per- frame baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency. Read More

#image-recognition, #nlp

AI-generated imagery is the new clip art as Microsoft adds DALL-E to its Office suite

Microsoft is adding AI-generated art to its suite of Office software with a new app named Microsoft Designer.

The app functions the same way as AI text-to-image models like DALL-E and Stable Diffusion, letting users type prompts to “instantly generate a variety of designs with minimal effort.” Microsoft says Designer can be used to create everything from greeting cards and social media posts to illustrations for PowerPoint presentations and logos for businesses.

Essentially, AI-generated imagery looks set to become the new clip art. Read More

#big7, #image-recognition, #nlp

The Illustrated Stable Diffusion

AI image generation is the most recent AI capability blowing people’s minds (mine included). The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performance model available to the masses (performance in terms of image quality, as well as speed and relatively low resource/memory requirements).

After experimenting with AI image generation, you may start to wonder how it works.

This is a gentle introduction to how Stable Diffusion works. Read More

#image-recognition