The beautiful, hilarious surrealism of early text-to-video AIs

A new creative AI system called ModelScope is now pumping out short videos in response to text prompts. The early results are wonderfully bizarre and thoroughly memeworthy – but it’s immediately clear how immensely powerful these tools will become.

Developed by a collaborative team at Huggingface, Modelscope is a “multi-stage text-to-video diffusion model,” which takes plain English text prompts, attempts to understand what you’re hoping to see, then generates and de-noises a short video for you. You can play with it online through a very simple interface. It’s very early days for this sort of thing, making it the perfect time to marvel both at its incredible capabilities and at its bizarre misunderstanding of the world. Read More

Try It

#vfx

Introducing Segment Anything: Working toward the first foundation model for image segmentation

Segmentation — identifying which image pixels belong to an object — is a core task in computer vision and is used in a broad array of applications, from analyzing scientific imagery to editing photos. But creating an accurate segmentation model for specific tasks typically requires highly specialized work by technical experts with access to AI training infrastructure and large volumes of carefully annotated in-domain data.

Today, we aim to democratize segmentation by introducing the Segment Anything project: a new task, dataset, and model for image segmentation, as we explain in our research paper. We are releasing both our general Segment Anything Model (SAM) and our Segment Anything 1-Billion mask dataset (SA-1B), the largest ever segmentation dataset, to enable a broad set of applications and foster further research into foundation models for computer vision. We are making the SA-1B dataset available for research purposes and the Segment Anything Model is available under a permissive open license (Apache 2.0). Check out the demo to try SAM with your own images. Read More

#vision

PEER: A Collaborative Language Model

Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today’s language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and incapable of verbally planning or explaining their actions. To address these shortcomings, we introduce PEER, a collaborative language model that is trained to imitate the entire writing process itself: PEER can write drafts, add suggestions, propose edits and provide explanations for its actions. Crucially, we train multiple instances of PEER able to infill various parts of the writing process, enabling the use of self-training techniques for increasing the quality, amount and diversity of training data. This unlocks PEER’s full potential by making it applicable in domains for which no edit histories are available and improving its ability to follow instructions, to write useful comments, and to explain its actions. We show that PEER achieves strong performance across various domains and editing tasks. Read More

#nlp

Introducing Imagica – a new way to think and create with computers

Read More

#devops, #videos