Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers’ computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.  – Read More

#nlp

You don’t need hosted LLMs, do you?

A comparison of self-hosted LLMs and OpenAI: cost, text generation quality, development speed, and privacy.

During the LLM hype, you can find a lot of articles like “Fine-tune your Private LLaMA/Falcon/Another Popular LLM”, “Train Your Own Private ChatGPT”, “How to Create a Local LLM” and others.

At the same time, only few people tell why you need it. I mean, are you really sure you need your own self-hosted LLM? Maybe the OpenAI API could be the best choice for you.  – Read More

#strategy, #nlp

Answer AI: A new old kind of R&D lab

Answer.AI is a new kind of AI R&D lab which creates practical end-user products based on foundational research breakthroughs.

Jeremy Howard (founding CEO, previously co-founder of Kaggle and fast.ai) and Eric Ries (founding director, previously creator of Lean Startup and the Long-Term Stock Exchange) today launched Answer.AI, a new kind of AI R&D lab which creates practical end-user products based on foundational research breakthroughs. The creation of Answer.AI is supported by an investment of USD10m from Decibel VC. Answer.AI will be a fully-remote team of deep-tech generalists—the world’s very best, regardless of where they live, what school they went to, or any other meaningless surface feature.  – Read More

#strategy

#devops

A Robot the Size of the World

…The classical definition of a robot is something that senses, thinks, and acts—that’s today’s Internet. We’ve been building a world-sized robot without even realizing it.

In 2023, we upgraded the “thinking” part with large-language models (LLMs) like GPT. ChatGPT both surprised and amazed the world with its ability to understand human language and generate credible, on-topic, humanlike responses. But what these are really good at is interacting with systems formerly designed for humans. Their accuracy will get better, and they will be used to replace actual humans.

In 2024, we’re going to start connecting those LLMs and other AI systems to both sensors and actuators. In other words, they will be connected to the larger world, through APIs. They will receive direct inputs from our environment, in all the forms I thought about in 2016. And they will increasingly control our environment, through IoT devices and beyond.  – Read More

#singularity, #robotics

#human

Mixtral-8x7B

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mistral-8x7B outperforms Llama 2 70B on most benchmarks we tested.

For full details of this model please read our release blog post.  – Read More

#devops

Building end-to-end security for Messenger

Today, we’re announcing that we’ve begun to upgrade people’s personal conversations on Messenger to use E2EE by default. Our aim is to ensure that everyone’s personal messages on Messenger can only be accessed by the sender and the intended recipients, and that everyone can be sure the messages they receive are from an authentic sender.

Meta is publishing two technical white papers on end-to-end encryption

Read More

#big7, #privacy

AI-generated news anchors show off superhuman abilities

There’s a new global news network launching in 2024 which completely ditches humans for AI-generated newsreaders – and they’re showing off some superhuman capabilities that make it very clear: the days of the human news presenter are numbered.

Channel 1’s photorealistic news anchors come in all shapes and sizes. They can all speak more or less any language, while evoking the stiff, formal body language familiar to anyone that still watches news on the TV. They’re even capable of making news-anchor-grade attempts at humor.  – Read More

#news-summarization

ChatGPT users complain the AI is getting lazy and sassy

OpenAI says it is investigating complaints about ChatGPT having become “lazy”.

In recent days, more and more users of the latest version of ChatGPT – built on OpenAI’s GPT-4 model – have complained that the chatbot refuses to do as people ask, or that it does not seem interested in answering their queries.

If the person asks for a piece of code, for instance, it might just give a little information and then instruct users to fill in the rest. Some complained that it did so in a particularly sassy way, telling people that they are perfectly able to do the work themselves, for instance.  – Read More

#chatbots

Model alignment protects against accidental harms, not intentional ones

Preventing harms from AI is important. The AI safety community calls this the alignment problem. The vast majority of development effort to date has been on technical methods that modify models themselves. We’ll call this model alignment, as opposed to sociotechnical ways to mitigate harm.

The main model alignment technique today is Reinforcement Learning with Human Feedback (RLHF), which has proven essential to the commercial success of chatbots. But RLHF has come to be seen as a catch-all solution to the dizzying variety of harms from language models. Consequently, there is much hand-wringing about the fact that adversaries can bypass it. Alignment techniques aren’t keeping up with progress in AI capabilities, the argument goes, so we should take drastic steps, such as “pausing” AI, to avoid catastrophe.

In this essay, we analyze why RLHF has been so useful. In short, its strength is in preventing accidental harms to everyday users. Then, we turn to its weaknesses. We argue that (1) despite its limitations, RLHF continues to be effective in protecting against casual adversaries (2) the fact that skilled and well-resourced adversaries can defeat it is irrelevant, because model alignment is not a viable strategy against such adversaries in the first place. To defend against catastrophic risks, we must look elsewhere.  – Read More

#adversarial, #trust

Microsoft releases Phi-2, a small language model AI that outperforms Llama 2, Mistral 7B

The rapid pace of generative AI news and announcements isn’t slowing down, even as we reach the final stretches of 2023 and the traditional winter holiday quiet period.

Just take a look at Microsoft Research, the blue sky division of the software giant, which today announced the release of its Phi-2 small language model (SML), a text-to-text AI program that is “small enough to run on a laptop or mobile device,” according to a post on X.  – Read More

#big7