Keyframer: Empowering Animation Design using Large Language Models

Large language models (LLMs) have the potential to impact a wide range of creative domains, but the application of LLMs to animation is underexplored and presents novel challenges such as how users might effectively describe motion in natural language. In this paper, we present Keyframer, a design tool for animating static images (SVGs) with natural language. Informed by interviews with professional animation designers and engineers, Keyframer supports exploration and refinement of animations through the combination of prompting and direct editing of generated output. The system also enables users to request design variants, supporting comparison and ideation. Through a user study with 13 participants, we contribute a characterization of user prompting strategies, including a taxonomy of semantic prompt types for describing motion and a ‘decomposed’ prompting style where users continually adapt their goals in response to generated output.We share how direct editing along with prompting enables iteration beyond one-shot prompting interfaces common in generative tools today. Through this work, we propose how LLMs might empower a range of audiences to engage with animation creation.  – Read More

#image-recognition

OpenAI joins Meta in labeling AI generated images

Not to be outdone by a rival, OpenAI today announced it is updating its marquee app ChatGPT and the AI image generator model integrated within it, DALL-E 3, to include new metadata tagging that will allow the company, and theoretically any user or other organization across the web, to identify the imagery as having been made with AI tools.

The move came just hours after Meta announced a similar measure to label AI images generated through its separate AI image generator Imagine and available on Instagram, Facebook, and Threads (and, also, trained on user-submitted imagery from some of those social platforms).  – Read More

#fake, #image-recognition

AI helps scholars read scroll buried when Vesuvius erupted in AD79

Researchers used AI to read letters on papyrus scroll damaged by the blast of heat, ash and pumice that destroyed Pompeii.

Scholars of antiquity believe they are on the brink of a new era of understanding after researchers armed with artificial intelligence read the hidden text of a charred scroll that was buried when Mount Vesuvius erupted nearly 2,000 years ago.  – Read More

#image-recognition

Google’s latest AI video generator can render cute animals in implausible situations

On Tuesday, Google announced Lumiere, an AI video generator that it calls “a space-time diffusion model for realistic video generation” in the accompanying preprint paper. But let’s not kid ourselves: It does a great job of creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.  – Read More

#image-recognition

The best AI image generators to create AI art

It’s hard to believe that it’s only been a year since the beta version of DALL-E, OpenAI’s text-to-image image generator, was set loose onto the internet. Since then, there’s been an explosion of AI-generated visual content, with people creating an average of 34 million images per day. That’s upwards of 15 billion images created using text-to-image algorithms last year alone. According to Everypixel Journal, it took photographers 150 years, from the first photograph taken in 1826 until 1975, to reach the 15 billion mark.

With new AI text-to-image generators launching at such a rapid pace, it’s tough to keep track of what’s out there, and which produces the best results. We’re here to break down the best AI image-making tools for generating high-quality images from simple descriptions or keywords, or for creating accurate image prompts based on uploaded reference images.   – Read More

#image-recognition

RealFill: Reference-Driven Generation for Authentic Image Completion

Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. — Read More

Read the Paper

#image-recognition, #big7

This new data poisoning tool lets artists fight back against generative AI

The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.

A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways. 

The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth. MIT Technology Review got an exclusive preview of the research, which has been submitted for peer review at computer security conference Usenix.    — Read More

#training, #image-recognition

DALL·E 3 is now available in ChatGPT Plus and Enterprise

ChatGPT can now create unique images from a simple conversation—and this new feature is available to Plus and Enterprise users today. Describe your vision, and ChatGPT will bring it to life by providing a selection of visuals for you to refine and iterate upon. You can ask for revisions right in the chat. This is powered by our most capable image model, DALL·E 3.

DALL·E 3 is the culmination of several research advancements, both from within and outside of OpenAI. Compared to its predecessor, DALL·E 3 generates images that are not only more visually striking but also crisper in detail. DALL·E 3 can reliably render intricate details, including text, hands, and faces. Additionally, it is particularly good in responding to extensive, detailed prompts, and it can support both landscape and portrait aspect ratios. These capabilities were achieved by training a state-of-the art image captioner to generate better textual descriptions for the images that we trained our models on. DALL·E 3 was then trained on these improved captions, resulting in a model which heeds much more attention to the user-supplied captions. You can read more about this process in our research paper. — Read More

#image-recognition

You can now generate AI images directly in the Google Search bar

Back in the olden days of last December, we had to go to specialized websites to have our natural language prompts transformed into generated AI art, but no longer! Google announced Thursday that users who have opted-in for its Search Generative Experience (SGE) will be able to create AI images directly from the standard Search bar.

SGE is Google’s vision for our web searching future. Rather than picking websites from a returned list, the system will synthesize a (reasonably) coherent response to the user’s natural language prompt using the same data that the list’s links led to. Thursday’s updates are a natural expansion of that experience, simply returning generated images (using the company’s Imagen text-to-picture AI) instead of generated text. Users type in a description of what they’re looking for (a Capybara cooking breakfast, in Google’s example) and, within moments, the engine will create four alternatives to pick from and refine further. Users will also be able to export their generated images to Drive or download them. — Read More

Opt In & Try It

#big7, #image-recognition

OpenAI releases third version of DALL-E

OpenAI announced the third version of its generative AI visual art platform DALL-E, which now lets users use ChatGPT to create prompts and includes more safety options. 

DALL-E converts text prompts to images. But even DALL-E 2 got things wrong, often ignoring specific wording. The latest version, OpenAI researchers said, understands context much better.

A new feature of DALL-E 3 is integration with ChatGPT. By using ChatGPT, someone doesn’t have to come up with their own detailed prompt to guide DALL-E 3; they can just ask ChatGPT to come up with a prompt, and the chatbot will write out a paragraph (DALL-E works better with longer sentences) for DALL-E 3 to follow. Other users can still use their own prompts if they have specific ideas for DALL-E. — Read More

#image-recognition