The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.
A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.
The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth. MIT Technology Review got an exclusive preview of the research, which has been submitted for peer review at computer security conference Usenix. — Read More
Tag Archives: Image Recognition
DALL·E 3 is now available in ChatGPT Plus and Enterprise
ChatGPT can now create unique images from a simple conversation—and this new feature is available to Plus and Enterprise users today. Describe your vision, and ChatGPT will bring it to life by providing a selection of visuals for you to refine and iterate upon. You can ask for revisions right in the chat. This is powered by our most capable image model, DALL·E 3.
DALL·E 3 is the culmination of several research advancements, both from within and outside of OpenAI. Compared to its predecessor, DALL·E 3 generates images that are not only more visually striking but also crisper in detail. DALL·E 3 can reliably render intricate details, including text, hands, and faces. Additionally, it is particularly good in responding to extensive, detailed prompts, and it can support both landscape and portrait aspect ratios. These capabilities were achieved by training a state-of-the art image captioner to generate better textual descriptions for the images that we trained our models on. DALL·E 3 was then trained on these improved captions, resulting in a model which heeds much more attention to the user-supplied captions. You can read more about this process in our research paper. — Read More
You can now generate AI images directly in the Google Search bar
Back in the olden days of last December, we had to go to specialized websites to have our natural language prompts transformed into generated AI art, but no longer! Google announced Thursday that users who have opted-in for its Search Generative Experience (SGE) will be able to create AI images directly from the standard Search bar.
SGE is Google’s vision for our web searching future. Rather than picking websites from a returned list, the system will synthesize a (reasonably) coherent response to the user’s natural language prompt using the same data that the list’s links led to. Thursday’s updates are a natural expansion of that experience, simply returning generated images (using the company’s Imagen text-to-picture AI) instead of generated text. Users type in a description of what they’re looking for (a Capybara cooking breakfast, in Google’s example) and, within moments, the engine will create four alternatives to pick from and refine further. Users will also be able to export their generated images to Drive or download them. — Read More
Opt In & Try It
OpenAI releases third version of DALL-E
OpenAI announced the third version of its generative AI visual art platform DALL-E, which now lets users use ChatGPT to create prompts and includes more safety options.
DALL-E converts text prompts to images. But even DALL-E 2 got things wrong, often ignoring specific wording. The latest version, OpenAI researchers said, understands context much better.
A new feature of DALL-E 3 is integration with ChatGPT. By using ChatGPT, someone doesn’t have to come up with their own detailed prompt to guide DALL-E 3; they can just ask ChatGPT to come up with a prompt, and the chatbot will write out a paragraph (DALL-E works better with longer sentences) for DALL-E 3 to follow. Other users can still use their own prompts if they have specific ideas for DALL-E. — Read More
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data. — Read More
Watch out, Midjourney! Ideogram launches AI image generator with impressive typography
Earlier this week, a new generative AI image startup called Ideogram, founded by former Google Brain researchers, launched with $16.5 million in seed funding led by a16z and Index Ventures.
Another image generator? Don’t we have enough to choose from between Midjourney, OpenAI’s Dall-E 2, and Stability AI’s Stable Diffusion? Well, Ideogram has a major selling point, as it may have finally solved a problem plaguing most other popular AI image generators to date: reliable text generation within the image, such as lettering on signs and for company logos. — Read More
FlexiViT: One Model for All Patch Sizes
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at this https URL — Read More
Introducing IDEFICS: An Open Reproduction of State-of-the-Art Visual Language Model
We are excited to release IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS), an open-access visual language model. IDEFICS is based on Flamingo, a state-of-the-art visual language model initially developed by DeepMind, which has not been released publicly. Similarly to GPT-4, the model accepts arbitrary sequences of image and text inputs and produces text outputs. IDEFICS is built solely on publicly available data and models (LLaMA v1 and OpenCLIP) and comes in two variants—the base version and the instructed version. Each variant is available at the 9 billion and 80 billion parameter sizes. — Read More
Stable Diffusion -XL 1.0-base
SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module.
Alternatively, we can use a two-stage pipeline as follows: First, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as “img2img”) to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations. — Read More
Source code is available at https://github.com/Stability-AI/generative-models .