Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this work, we introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, we propose dilated attention, which expands the attentive field exponentially as the distance grows. LongNet has significant advantages: 1) it has a linear computation complexity and a logarithm dependency between tokens; 2) it can be served as a distributed trainer for extremely long sequences; 3) its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization. Experiments results demonstrate that LongNet yields strong performance on both long-sequence modeling and general language tasks. Our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence. — Read More
Daily Archives: July 7, 2023
OpenAI is forming a new team to bring ‘superintelligent’ AI under control
OpenAI is forming a new team led by Ilya Sutskever, its chief scientist and one of the company’s co-founders, to develop ways to steer and control “superintelligent” AI systems.
In a blog post published today, Sutskever and Jan Leike, a lead on the alignment team at OpenAI, predict that AI with intelligence exceeding that of humans could arrive within the decade. This AI — assuming it does, indeed, arrive eventually — won’t necessarily be benevolent, necessitating research into ways to control and restrict it, Sutskever and Leike say. — Read More
OpenAI launches its GPT-4 API into general availability
OpenAI LP today made GPT-4, its newest and most capable language model, generally available through a cloud-based application programming interface.
… Alongside GPT-4, OpenAI is making three other AI models’ APIs generally available: GPT-3.5 Turbo, a predecessor to GPT-4 that offers more limited capabilities for a significantly lower cost, DALL-E for image generation, and Whisper for speech transcription. — Read More