Stability AI has announced StableStudio, a new open-source variant of its DreamStudio AI text-to-image web app.
Stability AI is releasing an open-source version of DreamStudio, a commercial interface for the company’s AI image generator model, Stable Diffusion. In a press statement on Wednesday, Stability AI said the new release — dubbed StableStudio — “marks a fresh chapter” for the platform and will serve as a showcase for the company’s “dedication to advancing open-source development.” — Read More
Tag Archives: Image Recognition
Stability AI releases an open source text-to-animation tool
You’ve heard of text-to-image, but have you heard of text-to-animation?
From anime to childhood classics, animations have brought stories to life by combining still images. Now, with just a text prompt, you can generate your own animations using AI.
On Thursday, Stability AI, the AI company that created Stable Diffusion, unveiled a text-to-animation tool that allows developers and artists to use Stable Diffusion models to generate animations. — Read More
Google’s open-source AI tool let me play my favorite Dreamcast game with my face
Project Gameface is ready to install as a Windows app that makes gaming more accessible using only your webcam.
While Wednesday’s Google I/O event largely hyped the company’s biggest AI initiatives, the company also announced updates to the machine learning suite that powers Google Lens and Google Meet features like object tracking and recognition, gesture control, and of course, facial detection. The newest update enables app developers to, among other things, create Snapchat-like face filters and hand tracking, with the company showing off a GIF that’s definitely not a Memoji.
This update underpins a special project announced during the I/O developer keynote: an open-source accessibility application called Project Gameface, which lets you play games… with your face. During the keynote, Google played a very Wes Anderson-esque mini-documentary revealing a tragedy that prompted the company to design Gameface. — Read More
MidJourney Has Competition (And It’s Free To Use)!
Midjourney 5.1 Arrives – And It’s Another Leap Forward For AI Art
Midjourney 5.1 has been released, bringing another significant improvement in the quality of results from the generative AI art service.
The company claims that version 5.1 of the engine is “more opinionated”, bringing it closer to the kind of results that you would get with version 4 of Midjourney, but at a higher quality. There’s also a “raw” mode, for those who don’t want images that are as strongly opinionated.
Other claimed improvements include greater accuracy, fewer unwanted borders or text artifacts in images, and improved sharpness. Read More
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512 x 1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Read More
Paper
Enhancing Vision-language Understanding with Advanced Large Language Models
The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly generating websites from handwritten text and identifying humorous elements within images. These features are rarely observed in previous vision language models. We believe the primary reason for GPT-4’s advanced multi-modal generation capabilities lies in the utilization of a more advanced large language model (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen LLM, Vicuna, using just one projection layer. Our findings reveal that MiniGPT-4 possesses many capabilities similar to those exhibited by GPT-4 like detailed image description generation and website creation from hand-written drafts. Furthermore, we also observe other emerging capabilities in MiniGPT-4, including writing stories and poems inspired by given images, providing solutions to problems shown in images, teaching users how to cook based on food photos, etc. In our experiment, we found that only performing the pretraining on raw image-text pairs could produce unnatural language outputs that lack coherency including repetition and fragmented sentences. To address this problem, we curate a high-quality, well-aligned dataset in the second stage to finetune our model using a conversational template. This step proved crucial for augmenting the model’s generation reliability and overall usability. Notably, our model is highly computationally efficient, as we only train a projection layer utilizing approximately 5 million aligned image-text pairs. Our code, pre-trained model, and collected dataset are available at https://minigpt-4.github.io/. Read More
Paper
demo links here: Link1Link2Link3Link4Link5Link6
Sony World Photography Award 2023: Winner refuses award after revealing AI creation
The winner of a major photography award has refused his prize after revealing his work was in fact an AI creation.
German artist Boris Eldagsen’s entry, entitled Pseudomnesia: The Electrician, won the creative open category at last week’s Sony World Photography Award.
He said he used the picture to test the competition and to create a discussion about the future of photography. Read More
Stability AI debuts next-gen photorealistic image generation model
Generative artificial intelligence company Stability AI Ltd. today released an updated version of its popular open-source photorealistic image generation model.
…The latest model is called Stable Diffusion XL, and it’s the latest addition to the Stable Diffusion suite. It’s being made available through an application programming interface and caters to enterprise developers. Using SDXL, developers will be able to create more detailed imagery. The company says it represents a key step forward in its image generation models. Read More
OpenAI looks beyond diffusion with ‘consistency’-based image generator
The field of image generation moves quickly. Though the diffusion models used by popular tools like Midjourney and Stable Diffusion may seem like the best we’ve got, the next thing is always coming — and OpenAI might have hit on it with “consistency models,” which can already do simple tasks an order of magnitude faster than the likes of DALL-E.
The paper was put online as a preprint last month, and was not accompanied by the understated fanfare OpenAI reserves for its major releases. That’s no surprise: This is definitely just a research paper, and it’s very technical. But the results of this early and experimental technique are interesting enough to note. Read More