-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
Environment info
transformersversion: 4.13.0.dev0- Platform: Ubuntu 18.04
- Python version: Python 3.6.9
- PyTorch version (GPU?): 1.11.0.dev20211110+cu111
- Tensorflow version (GPU?): 2.6.2
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using (Bert, XLNet ...): microsoft/deberta-v3-large
The problem arises when using:
- the official example scripts: (give details below): https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers
- [] my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name) mnli
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- go to transformers/examples/pytorch/text-classification/
- Run -
python3 run_glue.py --model_name_o r_path microsoft/deberta-v3-large --task_name mnli --do_train --do_eval --evaluation_strategy steps --max_seq_length 25 6 --warmup_steps 50 --learning_rate 6e-5 --num_train_epochs 3 --output_dir outputv3 --overwrite_output_dir --logging_ steps 10000 --logging_dir outputv3/
or run the script given in the model card - https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers
Expected behavior
Training of microsoft/deberta-v3-large on the mnli dataset.
The error I am getting-
Traceback (most recent call last):
File "run_glue.py", line 568, in
main()
File "run_glue.py", line 486, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1316, in train
tr_loss_step = self.training_step(model, inputs)
File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1867, in training_step
loss.backward()
File "/home/nikhil/.local/lib/python3.6/site-packages/torch/_tensor.py", line 352, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/function.py", line 199, in apply
return user_fn(self, *args)
File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 114, in backward
inputGrad = _softmax_backward_data(grad_output, output, self.dim, output)
TypeError: _softmax_backward_data(): argument 'input_dtype' (position 4) must be torch.dtype, not Tensor
0%|
I am also getting the same error when trying to train Deberta-v2