First Neuralink patient can control a computer mouse by thinking, claims Elon Musk

The first human being to receive a brain chip from Elon Musk’s Neuralink can apparently control a computer mouse just by thinking, according to Musk.

…”Progress is good, and the patient seems to have made a full recovery, with no ill effects that we are aware of,” Musk said. “Patient is able to move a mouse around the screen by just thinking.”

…Last month, Musk shared in a post on X that Neuralink had successfully performed the transplant surgery on a human for the first time on Jan. 28. — Read More

#human

Gemma: Introducing new state-of-the-art open models

At Google, we believe in making AI helpful for everyone. We have a long history of contributing innovations to the open community, such as with TransformersTensorFlowBERTT5JAXAlphaFold, and AlphaCode. Today, we’re excited to introduce a new generation of open models from Google to assist developers and researchers in building AI responsibly.

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini, and the name reflects the Latin gemma, meaning “precious stone.” Accompanying our model weights, we’re also releasing tools to support developer innovation, foster collaboration, and guide responsible use of Gemma models. — Read More

#devops

How AI is changing gymnastics judging

There was one individual Olympic spot left. According to the intricate set of rules governing who gets slots for the games, it would come down to who placed highest in the high bar final: Croatia’s Tin Srbić or Brazil’s Arthur Nory Mariano.

They were at the 2023 World Championships in Antwerp, Belgium, last October. Mariano went first. He fell during his routine, giving Srbić some wiggle room. He didn’t need it, though: Srbić completed a clean routine, with Tkachev connections and a double-twisting double layout that he stuck cold; at the end of his routine, he pumped his fists in the air in celebration. He’d qualified for the 2024 Paris Olympics. 

But when his score came in—a 14.500—Srbić thought the judges had made a mistake, one that could cost him a medal at Worlds. He needed to decide if he wanted to make a challenge.  

… These championships were the first time the technology, formally known as the Judging Support System, or JSS, had been used on every apparatus in a gymnastics competition—and its first use in a competition that could make or break an athlete’s Olympic dreams. While the AI judging system did not replace human judges—rather, it was available to help judges review routines in case of an inquiry or a “blocked score”—it still marked a watershed moment for the sport that was years in the making.  — Read More

#augmented-intelligence

Latte: Latent Diffusion Transformer for Video Generation

We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation. — Read More

#nlp, #image-recognition

Retrieval-Augmented Generation for Large Language Models: A Survey

Large Language Models (LLMs) demonstrate significant capabilities but face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the models, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs’ intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval , the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces the metrics and benchmarks for assessing RAG models, along with the most up-to-date evaluation framework. In conclusion, the paper delineates prospective avenues for research, including the identification of challenges, the expansion of multi-modalities, and the progression of the RAG infrastructure and its ecosystem. — Read More

Original Paper

#nlp, #performance

Sora, Groq, and Virtual Reality

Matthew Ball wrote a fun essay earlier this month entitled On Spatial Computing, Metaverse, the Terms Left Behind and Ideas Renewed, tracing the various terms that have been used to describe, well, that’s what the essay is about: virtual realityaugmented realitymixed realityMetaverse, are words that have been floating around for decades now, both in science fiction and in products, to describe what Apple is calling spatial computing.

Personally, I agree with Ball that “Metaverse” is the best of the lot, particularly given Ball’s succinct description of the concept in his conclusion:

I liked the term Metaverse because it worked like the Internet, but for 3D. It wasn’t about a device or even computing at large, just as the Internet was not about PC nor the client-server model. The Metaverse is a vast and interconnected network of real-time 3D experiences. For passthrough or optical MR to scale, a “3D Internet” is required – which means overhauls to networking infrastructure and protocols, advances in computing infrastructure, and more. This is, perhaps the one final challenge with the term – it describes more of an end state than a transition. — Read More

#metaverse, #vfx

Microsoft, OpenAI say U.S. rivals use artificial intelligence in hacking

Russia, China and other U.S. adversaries are using the newest wave of artificial intelligence tools to improve their hacking abilities and find new targets for online espionage, according to a report Wednesday from Microsoft and its close business partner OpenAI. — Read More

#cyber, #russia, #china

If you thought Sora was impressive now watch it with AI generated sound from ElevenLabs

Artificial intelligence speech startup ElevenLabs offered an insight into what its planning to release in the future, adding sound effects to AI generated video for the first time.

Best known for its near human-like text-to-speech and synthetic voice services, ElevenLabs added artificially generated sound effects to videos produced using OpenAI’s Sora.

OpenAI unveiled its impressive Sora text-to-video artificial intelligence model last week, showcasing some of the most realistic, consistent and longest AI generated video to date. — Read More

#audio, #vfx

Sora, and the Future of VFX Compositing

… The Future (You will experience this moment soon)

There’s a moment that stays with you—the first time you witness your thoughts materialize into visual marvels on the screen. It’s akin to the first successful alchemists turning lead into gold, except our lead is the raw, unshaped ideas, and our gold, the breathtaking visuals rendered from the ether of our imagination. The advent of AI-driven tools like OpenAI’s Sora has been nothing short of a revelation, a glimpse into a future where creating temporally consistent video content is as effortless as describing a sunrise to a friend. — Read More

#vfx

Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings

We present Chatbot Arena, a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. In this blog post, we are releasing our initial results and a leaderboard based on the Elo rating system, which is a widely-used rating system in chess and other competitive games. We invite the entire community to join this effort by contributing new models and evaluating them by asking questions and voting for your favorite answer. — Read More

You can compare models’ relative performance for yourself, or add new models, here.

#performance