Skip to content

WavLM returns empty hidden states when loaded directly to GPU #31970

@rumourscape

Description

@rumourscape

System Info

  • transformers version: 4.42.4
  • Platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
  • Python version: 3.9.19
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.31.0
  • PyTorch version (GPU?): 2.3.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA RTX A6000

Who can help?

@sanchit-gandhi @Gant

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Outputs of the hidden states are NaN when directly loading the model to the GPU. They work when the model is run on the CPU or first loaded to the CPU then moved to the GPU.

This issue can be reproduced using the following code taken from WavLM's huggingface documentation.

from transformers import WavLMModel, AutoFeatureExtractor
import torch
from datasets import load_dataset

dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation", trust_remote_code=True)
dataset = dataset.sort("id")
sampling_rate = dataset.features["audio"].sampling_rate

processor = AutoFeatureExtractor.from_pretrained("microsoft/wavlm-large")
model = WavLMModel.from_pretrained("microsoft/wavlm-large", device_map="cuda:4")
model.eval()

# audio file is decoded on the fly
inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs.to("cuda:4"), output_hidden_states=True)

last_hidden_states = outputs.last_hidden_state
print(last_hidden_states)

The above outputs a tensor with only NaNs. This does not occur if we load the model to the cpu first and then move it to the gpu. ( model.to("cuda:4"))

Expected behavior

The hidden states are not NaN when the model is loaded directly to the gpu.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions