System Info
transformers version: 4.42.4
- Platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
- Python version: 3.9.19
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: 0.31.0
- PyTorch version (GPU?): 2.3.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA RTX A6000
Who can help?
@sanchit-gandhi @Gant
Information
Tasks
Reproduction
Outputs of the hidden states are NaN when directly loading the model to the GPU. They work when the model is run on the CPU or first loaded to the CPU then moved to the GPU.
This issue can be reproduced using the following code taken from WavLM's huggingface documentation.
from transformers import WavLMModel, AutoFeatureExtractor
import torch
from datasets import load_dataset
dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation", trust_remote_code=True)
dataset = dataset.sort("id")
sampling_rate = dataset.features["audio"].sampling_rate
processor = AutoFeatureExtractor.from_pretrained("microsoft/wavlm-large")
model = WavLMModel.from_pretrained("microsoft/wavlm-large", device_map="cuda:4")
model.eval()
# audio file is decoded on the fly
inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs.to("cuda:4"), output_hidden_states=True)
last_hidden_states = outputs.last_hidden_state
print(last_hidden_states)
The above outputs a tensor with only NaNs. This does not occur if we load the model to the cpu first and then move it to the gpu. ( model.to("cuda:4"))
Expected behavior
The hidden states are not NaN when the model is loaded directly to the gpu.
System Info
transformersversion: 4.42.4Who can help?
@sanchit-gandhi @Gant
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
Outputs of the hidden states are NaN when directly loading the model to the GPU. They work when the model is run on the CPU or first loaded to the CPU then moved to the GPU.
This issue can be reproduced using the following code taken from WavLM's huggingface documentation.
The above outputs a tensor with only NaNs. This does not occur if we load the model to the cpu first and then move it to the gpu. (
model.to("cuda:4"))Expected behavior
The hidden states are not NaN when the model is loaded directly to the gpu.