Monthly Archives: June 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Efficiently modeling sequences with infinite context length has been a long-standing problem. Past works suffer from either the quadratic computation complexity or the limited extrapolation ability on length generalization. In this work, we present Samba, a simple hybrid architecture that layer-wise combines Mamba, a selective State Space Model (SSM), with Sliding Window Attention (SWA). Samba selectively compresses a given sequence into recurrent hidden states while still maintaining the ability to precisely recall memories with the attention mechanism. We scale Samba up to 3.8B parameters with 3.2T training tokens and show that Samba substantially outperforms the state-of-the-art models based on pure attention or SSMs on a wide range of benchmarks. When trained on 4K length sequences, Samba can be efficiently extrapolated to 256K context length with perfect memory recall and show improved token predictions up to 1M context length. As a linear-time sequence model, Samba enjoys a 3.73x higher throughput compared to Transformers with grouped-query attention when processing user prompts of 128K length, and 3.64x speedup when generating 64K tokens with unlimited streaming. A sample implementation of Samba is publicly available in this https URL. — Read More
ChatGPT is bullshit
Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems. — Read More
Banishing LLM Hallucinations Requires Rethinking Generalization
Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations — Lamini-1 — that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically. — Read More
SUTRA: Scalable Multilingual Language Model Architecture
In this paper, we introduce SUTRA, multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages. SUTRA’s design uniquely decouples core conceptual understanding from language-specific processing, which facilitates scalable and efficient multilingual alignment and learning. Employing a Mixture of Experts framework both in language and concept processing, SUTRA demonstrates both computational efficiency and responsiveness. Through extensive evaluations, SUTRA is demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on leading Massive Multitask Language Understanding (MMLU) benchmarks for multilingual tasks. SUTRA models are also online LLMs that can use knowledge from the internet to provide hallucination-free, factual and up-to-date responses while retaining their multilingual capabilities. Furthermore, we explore the broader implications of its architecture for the future of multilingual AI, highlighting its potential to democratize access to AI technology globally and to improve the equity and utility of AI in regions with predominantly non-English languages. Our findings suggest that SUTRA not only fills pivotal gaps in multilingual model capabilities but also establishes a new benchmark for operational efficiency and scalability in AI applications. — Read More
SITUATIONAL AWARENESS: The Decade Ahead
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace many college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be unleashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war. — Read More
Synthesia’s hyperrealistic deepfakes will soon have full bodies
Startup Synthesia’s AI-generated avatars are getting an update to make them even more realistic: They will soon have bodies that can move, and hands that gesticulate.
The new full-body avatars will be able to do things like sing and brandish a microphone while dancing, or move from behind a desk and walk across a room. They will be able to express more complex emotions than previously possible, like excitement, fear, or nervousness, says Victor Riparbelli, the company’s CEO. Synthesia intends to launch the new avatars toward the end of the year. — Read More
She Built an AI Product Manager Bringing in Six Figures—As A Side Hustle
How Claire Vo created ChatPRD while working a demanding job
Claire Vo built ChatPRD—an on-demand chief product officer powered by AI. It’s now used by over 10,000 product managers and is pulling in six figures in revenue.
The best part?
Claire has a demanding day job as the chief product officer at LaunchDarkly. So she built all of ChatPRD herself—over the weekend—with AI. — Read More
Anthropic just dropped Claude 3.5 Sonnet with better vision and a sense of humor
Claude 3.5 Sonnet is the latest artificial intelligence model from Anthropic, one of the leading AI labs in the world. The company promises it is faster than its predecessor, has a better understanding of humor and can even read your handwriting.
Claude 3 Opus was already impressive. A model I dubbed the “most human-like” of any of the AI chatbots. I had a quick play with 3.5 Sonnet and it does seem more natural and with a better understanding of sarcasm. Claude is also listed as the best alternative to ChatGPT in my guide to chatbots. — Read More
AI Discovers That Not Every Fingerprint Is Unique
Columbia engineers have built a new AI that shatters a long-held belief in forensics–that fingerprints from different fingers of the same person are unique. It turns out they are similar, only we’ve been comparing fingerprints the wrong way!
… It’s a well-accepted fact in the forensics community that fingerprints of different fingers of the same person–”intra-person fingerprints”–are unique, and therefore unmatchable.
A team led by Columbia Engineering undergraduate senior Gabe Guo challenged this widely held presumption. Guo, who had no prior knowledge of forensics, found a public U.S. government database of some 60,000 fingerprints and fed them in pairs into an artificial intelligence-based system known as a deep contrastive network. Sometimes the pairs belonged to the same person (but different fingers), and sometimes they belonged to different people. — Read More