Data type error while fine-tuning Deberta v3 Large using code provided

## Environment info


- `transformers` version: 4.13.0.dev0
- Platform: Ubuntu 18.04
- Python version: Python 3.6.9
- PyTorch version (GPU?): 1.11.0.dev20211110+cu111
- Tensorflow version (GPU?): 2.6.2
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

### Who can help

@LysandreJik
## Information

Model I am using (Bert, XLNet ...):  microsoft/deberta-v3-large

The problem arises when using:
* [x] the official example scripts: (give details below): https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers
* [] my own modified scripts: (give details below)

The tasks I am working on is:
* [x] an official GLUE/SQUaD task: (give the name) mnli
* [ ] my own task or dataset: (give details below)

## To reproduce

Steps to reproduce the behavior:

1. go to transformers/examples/pytorch/text-classification/
2. Run -  `python3 run_glue.py  --model_name_o
r_path microsoft/deberta-v3-large --task_name mnli   --do_train   --do_eval   --evaluation_strategy steps   --max_seq_length 25
6   --warmup_steps 50   --learning_rate 6e-5   --num_train_epochs 3   --output_dir outputv3 --overwrite_output_dir   --logging_
steps 10000   --logging_dir outputv3/`
or run the script given in the model card - https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers



## Expected behavior

Training of microsoft/deberta-v3-large on the mnli dataset.

The error I am getting-
Traceback (most recent call last):
  File "run_glue.py", line 568, in <module>
    main()
  File "run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1316, in train
    tr_loss_step = self.training_step(model, inputs)
  File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1867, in training_step
    loss.backward()
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/_tensor.py", line 352, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/function.py", line 199, in apply
    return user_fn(self, *args)
  File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 114, in backward
    inputGrad = _softmax_backward_data(grad_output, output, self.dim, output)
TypeError: _softmax_backward_data(): argument 'input_dtype' (position 4) must be torch.dtype, not Tensor
  0%|            

I am also getting the same error when trying to train Deberta-v2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data type error while fine-tuning Deberta v3 Large using code provided #14375

Environment info

Who can help

Information

To reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data type error while fine-tuning Deberta v3 Large using code provided #14375

Description

Environment info

Who can help

Information

To reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions