-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Description
Environment info
transformersversion: 4.9.2- Platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.17
- Python version: 3.8.11
- PyTorch version (GPU?): 1.9.1 (True)
- Tensorflow version (GPU?): 2.6.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: Parallel
@sgugger @patrickvonplaten @anton-l @Narsil
Information
Model I am using: wav2vec2
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
Hey, I'm new to Transformers so pardon me if this issue has an obvious fix I can't think of. I was trying to go through the Quick Tour (https://huggingface.co/docs/transformers/quicktour), and I encountered an error when running the code snippets mentioned there.
To reproduce
Steps to reproduce the behavior:
from transformers import pipeline
import datasets
speech_recognizer = pipeline ("automatic-speech-recognition", model = "facebook/wav2vec2-base-960h" ,device = 0)
dataset = datasets.load_dataset("superb", name ="asr", split = "test")
files = dataset["file"]
speech_recognizer(files[:4])
Here's the Stack Trace:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
/tmp/ipykernel_16600/2678924457.py in <module>
----> 1 speech_recognizer(files[:4])
~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py in __call__(self, inputs, **kwargs)
131 inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate)
132
--> 133 assert isinstance(inputs, np.ndarray), "We expect a numpy ndarray as input"
134 assert len(inputs.shape) == 1, "We expect a single channel audio input for AutomaticSpeechRecognitionPipeline"
135
AssertionError: We expect a numpy ndarray as input
I tried mitigating this error by converting the list of filenames to a numpy array, but I seem to get another error that I don't know how to deal with:
from transformers import pipeline
import datasets
import numpy as np
speech_recognizer = pipeline ("automatic-speech-recognition", model = "facebook/wav2vec2-base-960h" ,device = 0)
dataset = datasets.load_dataset("superb", name ="asr", split = "test")
files = dataset["file"]
speech_recognizer(np.array(files[:4]))
Stack Trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_16600/437131926.py in <module>
1 import numpy as np
2
----> 3 speech_recognizer(np.array(files[:4]))
~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py in __call__(self, inputs, **kwargs)
134 assert len(inputs.shape) == 1, "We expect a single channel audio input for AutomaticSpeechRecognitionPipeline"
135
--> 136 processed = self.feature_extractor(
137 inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
138 )
~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py in __call__(self, raw_speech, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, sampling_rate, **kwargs)
179 # zero-mean and unit-variance normalization
180 if self.do_normalize:
--> 181 raw_speech = self.zero_mean_unit_var_norm(raw_speech)
182
183 # convert into correct format for padding
~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py in zero_mean_unit_var_norm(input_values)
84 Every array in the list is normalized to have zero mean and unit variance
85 """
---> 86 return [(x - np.mean(x)) / np.sqrt(np.var(x) + 1e-5) for x in input_values]
87
88 def __call__(
~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py in <listcomp>(.0)
84 Every array in the list is normalized to have zero mean and unit variance
85 """
---> 86 return [(x - np.mean(x)) / np.sqrt(np.var(x) + 1e-5) for x in input_values]
87
88 def __call__(
<__array_function__ internals> in mean(*args, **kwargs)
~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims, where)
3417 return mean(axis=axis, dtype=dtype, out=out, **kwargs)
3418
-> 3419 return _methods._mean(a, axis=axis, dtype=dtype,
3420 out=out, **kwargs)
3421
~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims, where)
176 is_float16_result = True
177
--> 178 ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
179 if isinstance(ret, mu.ndarray):
180 ret = um.true_divide(
TypeError: cannot perform reduce with flexible type
I was wondering if someone could provide some insight on how to fix this?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels