-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
🌟 New model addition
Model description
DeBERTa (Decoding-enhanced BERT with disentangled attention) is a new model architecture:
In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency of model pre-training and performance of downstream tasks.
The paper can be found here.
Open source status
- the model implementation is available: GitHub
- the model weights are available: GitHub release
- who are the authors: @BigBird01