# 🌟 New model addition -- LongT5: Efficient Text-To-Text Transformer for Long Sequences ## Model description LongT5 is an extension of the [T5 model](https://github.com/google-research/text-to-text-transfer-transformer) that handles long sequence inputs more efficiently. We integrated attention ideas from long-input transformers [ETC](https://arxiv.org/abs/2004.08483),and adopted pre-training strategies from summarization pre-training [PEGASUS](https://arxiv.org/abs/1912.08777) into the scalable T5 architecture. The result is a new attention mechanism we call Transient Global(TGlobal), which mimics ETC’s local/globalattention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization and question answering tasks, as well as outperform the original T5 models on these tasks. *Description copied from https://github.com/google-research/longt5/blob/master/README.md.* The full paper is currently available on arXiv -- [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916). ## Open source status The model has its own repository available [here](https://github.com/google-research/longt5). * [x] the model implementation is available - the model implementation is available at [Google FlaxFormer repo](https://github.com/google/flaxformer/tree/main/flaxformer/architectures/longt5). * [x] the model weights are available: Currently, Google has released five checkpoints listed in the [LongT5 repo](https://github.com/google-research/longt5) - **LongT5-Local-Base** (250 million parameters) - **LongT5-TGlobal-Base** (250 million parameters) - **LongT5-Local-Large** (780 million parameters) - **LongT5-TGlobal-Large** (780 million parameters) - **LongT5-TGlobal-XL** (3 billion parameters) * [x] who are the authors: @mandyguo-xyguo, Joshua Ainslie, @duthus, @santiontanon, @nijianmo, @yhsung, @yinfeiy, (not sure with some GitHub names, so will be happy if anyone can complete it :] ) ### Additional context If anyone from the original authors won't be interested in porting the model into the `transformers`, I'll be more than happy to work on this :].