Welcome to the new surreal. How AI-generated video is changing film

The Frost nails its uncanny, disconcerting vibe in its first few shots. Vast icy mountains, a makeshift camp of military-style tents, a group of people huddled around a fire, barking dogs. It’s familiar stuff, yet weird enough to plant a growing seed of dread. There’s something wrong here.

“Pass me the tail,” someone says. Cut to a close-up of a man by the fire gnawing on a pink piece of jerky. It’s grotesque. The way his lips are moving isn’t quite right. For a beat it looks as if he’s chewing on his own frozen tongue.

Welcome to the unsettling world of AI moviemaking. “We kind of hit a point where we just stopped fighting the desire for photographic accuracy and started leaning into the weirdness that is DALL-E,” says Stephen Parker at Waymark, the Detroit-based video creation company behind The Frost.

The Frost is a 12-minute movie in which every shot is generated by an image-making AI. It’s one of the most impressive—and bizarre—examples yet of this strange new genre. You can watch the film below in an exclusive reveal from MIT Technology Review. — Read More

#vfx

The Falcon has landed in the Hugging Face ecosystem

Falcon is a new family of state-of-the-art language models created by the Technology Innovation Institute in Abu Dhabi, and released under the Apache 2.0 license. Notably, Falcon-40B is the first “truly open” model with capabilities rivaling many current closed-source models. This is fantastic news for practitioners, enthusiasts, and industry, as it opens the door for many exciting use cases.

In this blog, we will be taking a deep dive into the Falcon models: first discussing what makes them unique and then showcasing how easy it is to build on top of them (inference, quantization, finetuning, and more) with tools from the Hugging Face ecosystem. — Read More

#devops, #nlp

Ex-Google Officer Finally Speaks Out On The Dangers Of AI! – Mo Gawdat

Read More

#strategy, #videos

The Illusion of China’s AI Prowess

Regulating AI Will Not Set America Back in the Technology Race

The artificial intelligence revolution has reached Congress. The staggering potential of powerful AI systems, such as OpenAI’s text-based ChatGPT, has alarmed legislators, who worry about how advances in this fast-moving technology might remake economic and social life. Recent months have seen a flurry of hearings and behind-the-scenes negotiations on Capitol Hill as lawmakers and regulators try to determine how best to impose limits on the technology. But some fear that any regulation of the AI industry will incur a geopolitical cost. In a May hearing at the U.S. Senate, Sam Altman, the CEO of OpenAI, warned that “a peril” of AI regulation is that “you slow down American industry in such a way that China or somebody else makes faster progress.” That same month, AI entrepreneur Alexandr Wang insisted that “the United States is in a relatively precarious position, and we have to make sure we move fastest on the technology.” Indeed, the notion that Washington’s propensity for red tape could hurt it in the competition with Beijing has long occupied figures in government and in the private sector. Former Google CEO Eric Schmidt claimed in 2021 that “China is not busy stopping things because of regulation.” According to this thinking, if the United States places guardrails around AI, it could end up surrendering international AI leadership to China. — Read More

#china-vs-us

StyleDrop: Text-To-Image Generation in Any Style

We present StyleDrop that enables the generation of images that faithfully follow a specific style, powered by Muse, a text-to-image generative vision transformer. StyleDrop is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. StyleDrop works by efficiently learning a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters), and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image specifying the desired style. An extensive study shows that, for the task of style tuning text-to-image models, Styledrop on Muse convincingly outperforms other methods, including DreamBooth and Textual Inversion on Imagen or Stable Diffusion. — Read More

#big7, #image-recognition

Air Force colonel backtracks over his warning about how AI could go rogue and kill its human operators

… An Air Force colonel who oversees AI testing used what he now says is a hypothetical to describe a military AI going rogue and killing its human operator in a simulation in a presentation at a professional conference.

But after reports of the talk emerged Thursday, the colonel said that he misspoke and that the “simulation” he described was a “thought experiment” that never happened. — Read More

#dod, #robotics

Open-Source LLMs

In February, Meta released its large language model: LLaMA. Unlike OpenAI and its ChatGPT, Meta didn’t just give the world a chat window to play with. Instead, it released the code into the open-source community, and shortly thereafter the model itself was leaked. Researchers and programmers immediately started modifying it, improving it, and getting it to do things no one else anticipated. And their results have been immediate, innovative, and an indication of how the future of this technology is going to play out. Training speeds have hugely increased, and the size of the models themselves has shrunk to the point that you can create and run them on a laptop. The world of AI research has dramatically changed.

This development hasn’t made the same splash as other corporate announcements, but its effects will be much greater. It will wrest power from the large tech corporations, resulting in both much more innovation and a much more challenging regulatory landscape. The large corporations that had controlled these models warn that this free-for-all will lead to potentially dangerous developments, and problematic uses of the open technology have already been documented. But those who are working on the open models counter that a more democratic research environment is better than having this powerful technology controlled by a small number of corporations. — Read More

#devops, #nlp

Generating Images with Multimodal Language Models

We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to generate coherent image (and text) outputs. To achieve strong performance on image generation, we propose an efficient mapping network to ground the LLM to an off-the-shelf text-to-image generation model. This mapping network translates hidden representations of text into the embedding space of the visual models, enabling us to leverage the strong text representations of the LLM for visual outputs. Our approach outperforms baseline generation models on tasks with longer and more complex language. In addition to novel image generation, our model is also capable of image retrieval from a prespecified dataset, and decides whether to retrieve or generate at inference time. This is done with a learnt decision module which conditions on the hidden representations of the LLM. Our model exhibits a wider range of capabilities compared to prior multimodal language models. It can process image-and-text inputs, and produce retrieved images, generated images, and generated text — outperforming non-LLM based generation models across several text-to-image tasks that measure context dependence. — Read More

#multi-modal

The Urgent Risks of Runaway AI – and What to Do about Them | Gary Marcus 

Read More

#strategy, #videos

Digital Renaissance: NVIDIA Neuralangelo Research Reconstructs 3D Scenes

Neuralangelo, a new AI model by NVIDIA Research for 3D reconstruction using neural networks, turns 2D video clips into detailed 3D structures — generating lifelike virtual replicas of buildings, sculptures and other real-world objects.

Like Michelangelo sculpting stunning, life-like visions from blocks of marble, Neuralangelo generates 3D structures with intricate details and textures. Creative professionals can then import these 3D objects into design applications, editing them further for use in art, video game development, robotics and industrial digital twins. — Read More

#nvidia, #vfx