Andrew Ng’s Intel Innovation Luminary Keynote

Read More

#strategy-videos, #videos

OpenAI’s Greg Brockman: The Future of LLMs, Foundation & Generative Models (DALL·E 2 & GPT-3)

Read More
#videos, #llm

Whispers of A.I.’s Modular Future

ChatGPT is in the spotlight, but it’s Whisper—OpenAI’s open-source speech-transcription program—that shows us where machine learning is going.

One day in late December, I downloaded a program called Whisper.cpp onto my laptop, hoping to use it to transcribe an interview I’d done. I fed it an audio file and, every few seconds, it produced one or two lines of eerily accurate transcript, writing down exactly what had been said with a precision I’d never seen before. As the lines piled up, I could feel my computer getting hotter. This was one of the few times in recent memory that my laptop had actually computed something complicated—mostly I just use it to browse the Web, watch TV, and write. Now it was running cutting-edge A.I.

Despite being one of the more sophisticated programs ever to run on my laptop, Whisper.cpp is also one of the simplest. If you showed its source code to A.I. researchers from the early days of speech recognition, they might laugh in disbelief, or cry—it would be like revealing to a nuclear physicist that the process for achieving cold fusion can be written on a napkin. Whisper.cpp is intelligence distilled. It’s rare for modern software in that it has virtually no dependencies—in other words, it works without the help of other programs. Instead, it is ten thousand lines of stand-alone code, most of which does little more than fairly complicated arithmetic. It was written in five days by Georgi Gerganov, a Bulgarian programmer who, by his own admission, knows next to nothing about speech recognition. Gerganov adapted it from a program called Whisper, released in September by OpenAI, the same organization behind ChatGPT and dall-e. Whisper transcribes speech in more than ninety languages. In some of them, the software is capable of superhuman performance—that is, it can actually parse what somebody’s saying better than a human can.

What’s so unusual about Whisper is that OpenAI open-sourced it, releasing not just the code but a detailed description of its architecture. They also included the all-important “model weights”: a giant file of numbers specifying the synaptic strength of every connection in the software’s neural network. In so doing, OpenAI made it possible for anyone, including an amateur like Gerganov, to modify the program. Gerganov converted Whisper to C++, a widely supported programming language, to make it easier to download and run on practically any device. This sounds like a logistical detail, but it’s actually the mark of a wider sea change. Until recently, world-beating A.I.s like Whisper were the exclusive province of the big tech firms that developed them. Read More

#audio, #nlp

U.S. Outbound Investment into Chinese AI Companies

U.S. policymakers are increasingly concerned about the national security implications of U.S. investments in China, and some are considering a new regime for reviewing outbound investment security. The authors identify the main U.S. investors active in the Chinese artificial intelligence market and the set of AI companies in China that have benefitted from U.S. capital. They also recommend next steps for U.S. policymakers to better address the concerns over capital flowing into the Chinese AI ecosystem. Read More

#china-vs-us

Beginner’s Guide to Diffusion Models

An intuitive understanding of how AI-generated art is made by Stable Diffusion, Midjourney, or DALL-E

Recently, there has been an increased interest in OpenAI’s DALL-E, Stable Diffusion (the free alternative of DALL-E), and Midjourney (hosted on a Discord server). While AI-generated art is very cool, what is even more captivating is how it works in the first place. In the last section, I will include some resources for anyone to get started in this AI art space as well.

So how do these technologies work? It uses something called a latent diffusion model, and the idea behind it is actually ingenious. Read More

#diffusion

Extracting Training Data from Diffusion Models

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training. Read More

#chatbots, #nlp, #Diffusion