Researchers at Google have developed a new deep-learning model called BigBird that allows Transformer neural networks to process sequences up to 8x longer than previously possible. Networks based on this model achieved new state-of-the-art performance levels on natural-language processing (NLP) and genomics tasks.
The team described the model and a set of experiments in a paper published on arXiv. BigBird is a new self-attention model that reduces the neural-network complexity of Transformers, allowing for training and inference using longer input sequences. By increasing sequence length up to 8x, the team was able to achieve new state-of-the-art performance on several NLP tasks, including question-answering and document summarization. The team also used BigBird to develop a new application for Transformer models in genomic sequence representations, improving accuracy over previous models by 5 percentage points. Read More