Introducing Stable Video 3D: Quality Novel View Synthesis and 3D Generation from Single Images

Today we are releasing Stable Video 3D (SV3D), a generative model based on Stable Video Diffusion, advancing the field of 3D technology and delivering greatly improved quality and view-consistency.

This release features two variants: SV3D_u and SV3D_p. SV3D_u generates orbital videos based on single image inputs without camera conditioning. SV3D_p extends the capability by accommodating both single images and orbital views, allowing for the creation of 3D video along specified camera paths. 

Stable Video 3D can be used now for commercial purposes with a Stability AI Membership. For non-commercial use, you can download the model weights on Hugging Face and view our research paper here. — Read More

#vfx

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting. — Read More

#multi-modal

AI and the Evolution of Social Media

Oh, how the mighty have fallen. A decade ago, social media was celebrated for sparking democratic uprisings in the Arab world and beyond. Now front pages are splashed with stories of social platforms’ role in misinformation, business conspiracymalfeasance, and risks to mental health. In a 2022 survey, Americans blamed social media for the coarsening of our political discourse, the spread of misinformation, and the increase in partisan polarization.

Today, tech’s darling is artificial intelligence. Like social media, it has the potential to change the world in many ways, some favorable to democracy. But at the same time, it has the potential to do incredible damage to society.

There is a lot we can learn about social media’s unregulated evolution over the past decade that directly applies to AI companies and technologies. These lessons can help us avoid making the same mistakes with AI that we did with social media.

In particular, five fundamental attributes of social media have harmed society. AI also has those attributes. Note that they are not intrinsically evil. They are all double-edged swords, with the potential to do either good or ill. The danger comes from who wields the sword, and in what direction it is swung. This has been true for social media, and it will similarly hold true for AI. In both cases, the solution lies in limits on the technology’s use. — Read More

#strategy

Department of Homeland Security Unveils Artificial Intelligence Roadmap

DHS Will Launch Three Pilot Projects to Test AI Technology to Enhance Immigration Officer Training, Help Communities Build Resilience and Reduce Burden for Applying for Disaster Relief Grants, and Improve Efficiency of Law Enforcement Investigations.

… As part of the roadmap, DHS announced three innovative pilot projects that will deploy AI in specific mission areas. Homeland Security Investigations (HSI) will test AI to enhance investigative processes focused on detecting fentanyl and increasing efficiency of investigations related to combatting child sexual exploitation. The Federal Emergency Management Agency (FEMA) will deploy AI to help communities plan for and develop hazard mitigation plans to build resilience and minimize risks. And, United States Citizenship and Immigration Services (USCIS) will use AI to improve immigration officer training. — Read More

The RoadMap

#ic

Nvidia reveals Blackwell B200 GPU, the ‘world’s most powerful chip’ for AI

Nvidia’s must-have H100 AI chip made it a multitrillion-dollar company, one that may be worth more than Alphabet and Amazon, and competitors have been fighting to catch up. But perhaps Nvidia is about to extend its lead — with the new Blackwell B200 GPU and GB200 “superchip.”

Nvidia says the new B200 GPU offers up to 20 petaflops of FP4 horsepower from its 208 billion transistors. Also, it says, a GB200 that combines two of those GPUs with a single Grace CPU can offer 30 times the performance for LLM inference workloads while also potentially being substantially more efficient. It “reduces cost and energy consumption by up to 25x” over an H100, says Nvidia. — Read More

#nvidia