Generative AI models have reached a baseline capability of producing at least a passable video from a single image or short sentence. Companies building products around these models are claiming that anyone can make a snazzy promo video if they have some images or recordings — and videos usually perform better than static images or documents.
Peak XV and Tiger Global-backed Avataar released a new tool on Monday called Velocity. It creates product videos directly based on a product link. The company would be going against the likes of Amazon and Google, which are also experimenting with AI-powered video tools for ads. — Read More
Recent Updates Page 85
AI researcher François Chollet founds a new AI lab focused on AGI
François Chollet, an influential AI researcher, is launching a new startup that aims to build frontier AI systems with novel designs.
The startup, Ndea, will consist of an AI research and science lab. It’s looking to “develop and operationalize” AGI. AGI, which stands for “artificial general intelligence,” typically refers to AI that can perform any task a human can. It’s a goalpost for many AI companies, including OpenAI.
… Ndea plans to use a technique called program synthesis, in tandem with other technical approaches, to unlock AGI. — Read More
The Inherent Limits of Pretrained LLMs
Large Language Models (LLMs), trained on extensive web-scale corpora, have demonstrated remarkable abilities across diverse tasks, especially as they are scaled up. Nevertheless, even state-of-the-art models struggle in certain cases, sometimes failing at problems solvable by young children, indicating that traditional notions of task complexity are insufficient for explaining LLM capabilities. However, exploring LLM capabilities is complicated by the fact that most widely-used models are also `instruction-tuned’ to respond appropriately to prompts. With the goal of disentangling the factors influencing LLM performance, we investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. Through extensive experiments across various model families, scales and task types, which included instruction tuning 90 different LLMs, we demonstrate that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. By clarifying what instruction-tuning contributes, we extend prior research into in-context learning, which suggests that base models use priors from pretraining data to solve tasks. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve, with the added influence of the instruction-tuning dataset. — Read More
How is Google using AI for internal code migrations?
In recent years, there has been a tremendous interest in using generative AI, and particularly large language models (LLMs) in software engineering; indeed there are now several commercially available tools, and many large companies also have created proprietary ML-based tools for their own software engineers. While the use of ML for common tasks such as code completion is available in commodity tools, there is a growing interest in application of LLMs for more bespoke purposes. One such purpose is code migration.
This article is an experience report on using LLMs for code migrations at Google. It is not a research study, in the sense that we do not carry out comparisons against other approaches or evaluate research questions/hypotheses. Rather, we share our experiences in applying LLM-based code migration in an enterprise context across a range of migration cases, in the hope that other industry practitioners will find our insights useful. Many of these learnings apply to any application of ML in software engineering. We see evidence that the use of LLMs can reduce the time needed for migrations significantly, and can reduce barriers to get started and complete migration programs. — Read More
What to expect from Neuralink in 2025
In November, a young man named Noland Arbaugh announced he’d be livestreaming from his home for three days straight. His broadcast was in some ways typical fare: a backyard tour, video games, meet mom.
The difference is that Arbaugh, who is paralyzed, has thin electrode-studded wires installed in his brain, which he used to move a computer mouse on a screen, click menus, and play chess. The implant, called N1, was installed last year by neurosurgeons working with Neuralink, Elon Musk’s brain-interface company.
The possibility of listening to neurons and using their signals to move a computer cursor was first demonstrated more than 20 years ago in a lab setting. Now, Arbaugh’s livestream is an indicator that Neuralink is a whole lot closer to creating a plug-and-play experience that can restore people’s daily ability to roam the web and play games, giving them what the company has called “digital freedom.”
But this is not yet a commercial product. — Read More
AI Founder’s Bitter Lesson. Chapter 1 – History Repeats Itself
- Historically, general approaches always win in AI.
- Founders in AI application space now repeat the mistakes AI researchers made in the past.
- Better AI models will enable general purpose AI applications. At the same time, the added value of the software around the AI model will diminish.
Recent AI progress has enabled new products that solve a broad range of problems. I saw this firsthand watching over 100 pitches during YC alumni Demo Day. These problems share a common thread – they’re simple enough to be solved with constrained AI. Yet the real power of AI lies in its flexibility. While products with fewer constraints generally work better, current AI models aren’t reliable enough to build such products at scale. We’ve been here before with AI, many times. Each time, the winning move has been the same. AI founders need to learn this history, or I fear they’ll discover these lessons the hard way. — Read More
How GPT learns layer by layer
Large Language Models (LLMs) excel at tasks like language processing, strategy games, and reasoning but struggle to build generalizable internal representations essential for adaptive decision-making in agents. For agents to effectively navigate complex environments, they must construct reliable world models. While LLMs perform well on specific benchmarks, they often fail to generalize, leading to brittle representations that limit their real-world effectiveness. Understanding how LLMs build internal world models is key to developing agents capable of consistent, adaptive behavior across tasks. We analyze OthelloGPT, a GPT-based model trained on Othello gameplay, as a controlled testbed for studying representation learning. Despite being trained solely on next-token prediction with random valid moves, OthelloGPT shows meaningful layer-wise progression in understanding board state and gameplay. Early layers capture static attributes like board edges, while deeper layers reflect dynamic tile changes. To interpret these representations, we compare Sparse Autoencoders (SAEs) with linear probes, finding that SAEs offer more robust, disentangled insights into compositional features, whereas linear probes mainly detect features useful for classification. We use SAEs to decode features related to tile color and tile stability, a previously unexamined feature that reflects complex gameplay concepts like board control and long-term planning. We study the progression of linear probe accuracy and tile color using both SAE’s and linear probes to compare their effectiveness at capturing what the model is learning. Although we begin with a smaller language model, OthelloGPT, this study establishes a framework for understanding the internal representations learned by GPT models, transformers, and LLMs more broadly. Our code is publicly available: this https URL. — Read More
How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest
The technology firm OpenAI made headlines last month when its latest experimental chatbot model, o3, achieved a high score on a test that marks progress towards artificial general intelligence (AGI). OpenAI’s o3 scored 87.5%, trouncing the previous best score for an artificial intelligence (AI) system of 55.5%.
This is “a genuine breakthrough”, says AI researcher François Chollet, who created the test, called Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI)1, in 2019 while working at Google, based in Mountain View, California. A high score on the test doesn’t mean that AGI — broadly defined as a computing system that can reason, plan and learn skills as well as humans can — has been achieved, Chollet says, but o3 is “absolutely” capable of reasoning and “has quite substantial generalization power”.
Researchers are bowled over by o3’s performance across a variety of tests, or benchmarks, including the extremely difficult FrontierMath test, announced in November by the virtual research institute Epoch AI. …But many, including Rein, caution that it’s hard to tell whether the ARC-AGI test really measures AI’s capacity to reason and generalize. “ — Read More
Project DIGITS: NVIDIA’s Leap into Personal AI Supercomputing
When you own the platform, you own the experience. That’s why Apple invests so much in the iPhone. That’s what NVIDIA is aiming for with Project DIGITS, unveiled at CES 2025.
Project DIGITS democratizes access to advanced AI computing by introducing a compact and powerful personal AI supercomputer. It’s designed to make it possible for AI researchers, data scientists, students, and even hobbyists to develop, prototype, and fine-tune AI models directly from their desks. While professionals could fine-tune models locally before, they were often constrained by hardware limitations, high costs, or scalability issues. Project DIGITS eliminates these barriers by delivering computing power in a desktop form factor.
As Jensen Huang, founder and CEO of NVIDIA, said in a press release, “AI will be mainstream in every application for every industry. With Project DIGITS, the Grace Blackwell Superchip comes to millions of developers. Placing an AI supercomputer on the desks of every data scientist, AI researcher and student empowers them to engage and shape the age of AI.”
Project DIGITS is also a precursor for how personal computing could fuel the uptake of AI into consumers’ everyday lives in a way that VR devices cannot seem to do – perhaps not today, but sooner than we know. — Read More
DeepSeek-V3
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at this https URL. — Read More