`LongT5`: Efficient Text-To-Text Transformer for Long Sequences

# 🌟 New model addition -- LongT5: Efficient Text-To-Text Transformer for Long Sequences

## Model description

LongT5 is an extension of the [T5 model](https://github.com/google-research/text-to-text-transfer-transformer) that handles long sequence inputs more efficiently. We integrated attention ideas from long-input transformers [ETC](https://arxiv.org/abs/2004.08483),and adopted pre-training strategies from summarization pre-training [PEGASUS](https://arxiv.org/abs/1912.08777) into  the scalable T5 architecture. The result is a new attention mechanism we call Transient Global(TGlobal), which  mimics ETC’s local/globalattention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization and question answering tasks, as well  as outperform the original T5 models on these tasks.

*Description copied from https://github.com/google-research/longt5/blob/master/README.md.*

The full paper is currently available on arXiv -- [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916).

## Open source status

The model has its own repository available [here](https://github.com/google-research/longt5).

* [x] the model implementation is available - the model implementation is available at [Google FlaxFormer repo](https://github.com/google/flaxformer/tree/main/flaxformer/architectures/longt5). 
* [x] the model weights are available: Currently, Google has released five checkpoints listed in the [LongT5 repo](https://github.com/google-research/longt5)
- **LongT5-Local-Base** (250 million parameters)
- **LongT5-TGlobal-Base** (250 million parameters)
- **LongT5-Local-Large** (780 million parameters)
- **LongT5-TGlobal-Large** (780 million parameters)
- **LongT5-TGlobal-XL** (3 billion parameters)
* [x] who are the authors: @mandyguo-xyguo, Joshua Ainslie, @duthus, @santiontanon, @nijianmo, @yhsung, @yinfeiy, (not sure with some GitHub names, so will be happy if anyone can complete it :] )

### Additional context

If anyone from the original authors won't be interested in porting the model into the `transformers`, I'll be more than happy to work on this :].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly