Vision Language Models (Better, Faster, Stronger)

Vision Language Models (VLMs) are the talk of the town. In a previous blog post (from April 2024), we talked a lot about VLMs. A major chunk was about LLaVA, the first successful and easily reproducible open-source vision language model, along with tips on how to discover, evaluate, and fine-tune open models.

Since then, so much has changed. Models have become smaller yet more powerful. We’ve seen the rise of new architectures and capabilities (reasoning, agency, long video understanding, etc.). In parallel, entirely new paradigms, such as multimodal Retrieval Augmented Generation (RAG) and multimodal agents have taken shape.

In this blog post, we’ll take a look back and unpack everything that happened with vision language models the past year. You’ll discover key changes, emerging trends, and notable developments. — Read More

#vision

China built hundreds of AI data centers to catch the AI boom. Now many stand unused.

A year or so ago, Xiao Li was seeing floods of Nvidia chip deals on WeChat. A real estate contractor turned data center project manager, he had pivoted to AI infrastructure in 2023, drawn by the promise of China’s AI craze. 

At that time, traders in his circle bragged about securing shipments of high-performing Nvidia GPUs that were subject to US export restrictions. Many were smuggled through overseas channels to Shenzhen. At the height of the demand, a single Nvidia H100 chip, a kind that is essential to training AI models, could sell for up to 200,000 yuan ($28,000) on the black market. 

Now, his WeChat feed and industry group chats tell a different story. Traders are more discreet in their dealings, and prices have come back down to earth. Meanwhile, two data center projects Li is familiar with are struggling to secure further funding from investors who anticipate poor returns, forcing project leads to sell off surplus GPUs. “It seems like everyone is selling, but few are buying,” he says. — Read More

#china-ai