Skip to content

Data type error while fine-tuning Deberta v3 Large using code provided #14375

@NIKHILDUGAR

Description

@NIKHILDUGAR

Environment info

  • transformers version: 4.13.0.dev0
  • Platform: Ubuntu 18.04
  • Python version: Python 3.6.9
  • PyTorch version (GPU?): 1.11.0.dev20211110+cu111
  • Tensorflow version (GPU?): 2.6.2
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik

Information

Model I am using (Bert, XLNet ...): microsoft/deberta-v3-large

The problem arises when using:

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name) mnli
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. go to transformers/examples/pytorch/text-classification/
  2. Run - python3 run_glue.py --model_name_o r_path microsoft/deberta-v3-large --task_name mnli --do_train --do_eval --evaluation_strategy steps --max_seq_length 25 6 --warmup_steps 50 --learning_rate 6e-5 --num_train_epochs 3 --output_dir outputv3 --overwrite_output_dir --logging_ steps 10000 --logging_dir outputv3/
    or run the script given in the model card - https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers

Expected behavior

Training of microsoft/deberta-v3-large on the mnli dataset.

The error I am getting-
Traceback (most recent call last):
File "run_glue.py", line 568, in
main()
File "run_glue.py", line 486, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1316, in train
tr_loss_step = self.training_step(model, inputs)
File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1867, in training_step
loss.backward()
File "/home/nikhil/.local/lib/python3.6/site-packages/torch/_tensor.py", line 352, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/function.py", line 199, in apply
return user_fn(self, *args)
File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 114, in backward
inputGrad = _softmax_backward_data(grad_output, output, self.dim, output)
TypeError: _softmax_backward_data(): argument 'input_dtype' (position 4) must be torch.dtype, not Tensor
0%|

I am also getting the same error when trying to train Deberta-v2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions