What if you could listen to music or a podcast without headphones or earbuds and without disturbing anyone around you? Or have a private conversation in public without other people hearing you?
Our newly published research introduces a way to create audible enclaves – localized pockets of sound that are isolated from their surroundings. In other words, we’ve developed a technology that could create sound exactly where it needs to be.
The ability to send sound that becomes audible only at a specific location could transform entertainment, communication and spatial audio experiences. — Read More
Recent Updates Page 75
Everyone in AI is talking about Manus. We put it to the test.
Since the general AI agent Manus was launched last week, it has spread online like wildfire. And not just in China, where it was developed by the Wuhan-based startup Butterfly Effect. It’s made its way into the global conversation, with influential voices in tech, including Twitter cofounder Jack Dorsey and Hugging Face product lead Victor Mustar, praising its performance. Some have even dubbed it “the second DeepSeek,” comparing it to the earlier AI model that took the industry by surprise for its unexpected capabilities as well as its origin.
Manus claims to be the world’s first general AI agent, using multiple AI models (such as Anthropic’s Claude 3.5 Sonnet and fine-tuned versions of Alibaba’s open-source Qwen) and various independently operating agents to act autonomously on a wide range of tasks. (This makes it different from AI chatbots, including DeepSeek, which are based on a single large language model family and are primarily designed for conversational interactions.)
… MIT Technology Review was able to obtain access to Manus, and when I gave it a test-drive, I found that using it feels like collaborating with a highly intelligent and efficient intern: While it occasionally lacks understanding of what it’s being asked to do, makes incorrect assumptions, or cuts corners to expedite tasks, it explains its reasoning clearly, is remarkably adaptable, and can improve substantially when provided with detailed instructions or feedback. Ultimately, it’s promising but not perfect. — Read More
MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
Existing long-form video generation frameworks lack automated planning, requiring manual input for storylines, scenes, cinematography, and character interactions, resulting in high costs and inefficiencies. To address these challenges, we present MovieAgent, an automated movie generation via multi-agent Chain of Thought (CoT) planning. MovieAgent offers two key advantages: 1) We firstly explore and define the paradigm of automated movie/long-video generation. Given a script and character bank, our MovieAgent can generates multi-scene, multi-shot long-form videos with a coherent narrative, while ensuring character consistency, synchronized subtitles, and stable audio throughout the film. 2) MovieAgent introduces a hierarchical CoT-based reasoning process to automatically structure scenes, camera settings, and cinematography, significantly reducing human effort. By employing multiple LLM agents to simulate the roles of a director, screenwriter, storyboard artist, and location manager, MovieAgent streamlines the production pipeline. Experiments demonstrate that MovieAgent achieves new state-of-the-art results in script faithfulness, character consistency, and narrative coherence. Our hierarchical framework takes a step forward and provides new insights into fully automated movie generation. — Read More
The Top 100Gen AI Consumer Apps
In just six months, the consumer AI landscape has been redrawn. Some products surged, others stalled, and a few unexpected players rewrote the leaderboard overnight. Deepseek rocketed from obscurity to a leading ChatGPT challenger. AI video models advanced from experimental to fairly dependable (at least for short clips!). And so-called “vibecoding” is changing who can create with AI, not just who can use it. The competition is tighter, the stakes are higher, and the winners aren’t just launching, they’re sticking.
We turned to the data to answer: Which AI apps are people actively using? What’s actually making money, beyond being popular? And which tools are moving beyond curiosity-driven dabbling to become daily staples?
This is the fourth installment of the Top 100 Gen AI Consumer Apps, our bi-annual ranking of the top 50 AI-first web products (by unique monthly visits, per Similarweb) and top 50 AI-first mobile apps (by monthly active users, per Sensor Tower). Since our last report in August 2024, 17 new companies have entered the rankings of top AI-first web products. — Read More
The Model is the Product
There were a lot of speculation over the past years about what the next cycle of AI development could be. Agents? Reasoners? Actual multimodality?
I think it’s time to call it: the model is the product.
All current factors in research and market development push in this direction.
— Generalist scaling is stalling.
— Opinionated training is working much better than expected.
— Inference cost are in free fall.
This is also an uncomfortable direction. All investors have been betting on the application layer. In the next stage of AI evolution, the application layer is likely to be the first to be automated and disrupted. — Read More
I quit my FAANG job because it’ll be automated by the end of 2025
Until this February, I had gainful employment at [redacted FAANG co] doing machine learning engineering for fine-tuning LLMs on language translation tasks. It was a great gig, and I enjoyed the work and my coworkers. However, taking a medium-term look at the market dynamics surrounding my employment prompted me to quit a few weeks ago. I’m now convinced that my former job there will be obsolete by the end of the year. — Read More
DOJ: Google must sell Chrome, Android could be next
Google has gotten its first taste of remedies that Donald Trump’s Department of Justice plans to pursue to break up the tech giant’s monopoly in search. In the first filing since Trump allies took over the department, government lawyers backed off a key proposal submitted by the Biden DOJ. The government won’t ask the court to force Google to sell off its AI investments, and the way it intends to handle Android is changing. However, the most serious penalty is intact—Google’s popular Chrome browser is still on the chopping block. — Read More
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
Pipeline parallelism (PP) is widely used for training large language models (LLMs), yet its scalability is often constrained by high activation memory consumption as the number of in-flight microbatches grows with the degree of PP. In this paper, we focus on addressing this challenge by leveraging the under-explored memory offload strategy in PP. With empirical study, we discover that in the majority of standard configurations, at least half, and potentially all, of the activations can be offloaded with negligible overhead. In the cases where full overload is not possible, we introduce a novel selective offload strategy that decreases peak activation memory in a better-than-linear manner. Furthermore, we integrate memory offload with other techniques to jointly consider overall throughput and memory limitation. Our experiments proves that the per-device activation memory effectively reduces with the total number of stages, making PP a stronger alternative than TP, offering up to a 19\% acceleration with even lower memory consumption. The implementation is open-sourced at \href{this https URL}{this url}. – Read More
QwQ-32B: Embracing the Power of Reinforcement Learning
Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning.
Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models. We are excited to introduce QwQ-32B, a model with 32 billion parameters that achieves performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated). — Read More
You knew it was coming: Google begins testing AI-only search results
Google has become so integral to online navigation that its name became a verb, meaning “to find things on the Internet.” Soon, Google might just tell you what’s on the Internet instead of showing you. The company has announced an expansion of its AI search features, powered by Gemini 2.0. Everyone will soon see more AI Overviews at the top of the results page, but Google is also testing a more substantial change in the form of AI Mode. This version of Google won’t show you the 10 blue links at all—Gemini completely takes over the results in AI Mode. — Read More