-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Closed
Labels
Description
System Info
- transformers==4.19.2
- PyTorch (GPU?): 1.11.0+cu102 (True)
- GPUs: single V100
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# I have tested and the error happens to opt-1.3b, opt-2.7b, opt-6.7b, and opt-13b.
# opt-125m and opt-350m seems to work fine.
# I haven't tested opt-30b.
model_name = "facebook/opt-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
tokenizer.padding_side = "left"
# It works when torch_dtype=torch.float32
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = model.eval().to("cuda")
batch = tokenizer(
["Who are you?", "Joe Biden is the president of"],
padding=True, return_tensors="pt"
)
# It produces NaN in the early layers for the first sequence.
# I check the pattern, and NaN first appears in the padded token position.
model.generate(
input_ids=batch["input_ids"].to("cuda"),
attention_mask=batch["attention_mask"].to("cuda"),
do_sample=True, max_new_tokens=32
) Expected behavior
The generation under fp16 should be close to fp32.
Reactions are currently unavailable