OPT produce NaN during batched generation

### System Info

* transformers==4.19.2
* PyTorch (GPU?): 1.11.0+cu102 (True)
* GPUs: single V100


### Who can help?

@LysandreJik, @younesbelkada

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# I have tested and the error happens to opt-1.3b, opt-2.7b, opt-6.7b, and opt-13b.
# opt-125m and opt-350m seems to work fine.
# I haven't tested opt-30b.
model_name = "facebook/opt-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
tokenizer.padding_side = "left"
# It works when torch_dtype=torch.float32
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = model.eval().to("cuda")

batch = tokenizer(
    ["Who are you?", "Joe Biden is the president of"],
    padding=True, return_tensors="pt"
)

# It produces NaN in the early layers for the first sequence.
# I check the pattern, and NaN first appears in the padded token position.
model.generate(
    input_ids=batch["input_ids"].to("cuda"),
    attention_mask=batch["attention_mask"].to("cuda"),
    do_sample=True, max_new_tokens=32
) 
```

### Expected behavior

The generation under fp16 should be close to fp32.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPT produce NaN during batched generation #17433

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OPT produce NaN during batched generation #17433

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions