Skip to content

OPT produce NaN during batched generation #17433

@shijie-wu

Description

@shijie-wu

System Info

  • transformers==4.19.2
  • PyTorch (GPU?): 1.11.0+cu102 (True)
  • GPUs: single V100

Who can help?

@LysandreJik, @younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# I have tested and the error happens to opt-1.3b, opt-2.7b, opt-6.7b, and opt-13b.
# opt-125m and opt-350m seems to work fine.
# I haven't tested opt-30b.
model_name = "facebook/opt-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
tokenizer.padding_side = "left"
# It works when torch_dtype=torch.float32
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = model.eval().to("cuda")

batch = tokenizer(
    ["Who are you?", "Joe Biden is the president of"],
    padding=True, return_tensors="pt"
)

# It produces NaN in the early layers for the first sequence.
# I check the pattern, and NaN first appears in the padded token position.
model.generate(
    input_ids=batch["input_ids"].to("cuda"),
    attention_mask=batch["attention_mask"].to("cuda"),
    do_sample=True, max_new_tokens=32
) 

Expected behavior

The generation under fp16 should be close to fp32.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions