Skip to content

Error when running "Quick Tour" code snippets #16563

@srujanjoshi

Description

@srujanjoshi

Environment info

  • transformers version: 4.9.2
  • Platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.17
  • Python version: 3.8.11
  • PyTorch version (GPU?): 1.9.1 (True)
  • Tensorflow version (GPU?): 2.6.0 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Parallel

@sgugger @patrickvonplaten @anton-l @Narsil

Information

Model I am using: wav2vec2

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

Hey, I'm new to Transformers so pardon me if this issue has an obvious fix I can't think of. I was trying to go through the Quick Tour (https://huggingface.co/docs/transformers/quicktour), and I encountered an error when running the code snippets mentioned there.

To reproduce

Steps to reproduce the behavior:


from transformers import pipeline
import datasets
speech_recognizer = pipeline ("automatic-speech-recognition", model = "facebook/wav2vec2-base-960h" ,device = 0)
dataset = datasets.load_dataset("superb", name ="asr", split = "test")
files = dataset["file"]
speech_recognizer(files[:4])

Here's the Stack Trace:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_16600/2678924457.py in <module>
----> 1 speech_recognizer(files[:4])

~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py in __call__(self, inputs, **kwargs)
    131             inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate)
    132 
--> 133         assert isinstance(inputs, np.ndarray), "We expect a numpy ndarray as input"
    134         assert len(inputs.shape) == 1, "We expect a single channel audio input for AutomaticSpeechRecognitionPipeline"
    135 

AssertionError: We expect a numpy ndarray as input

I tried mitigating this error by converting the list of filenames to a numpy array, but I seem to get another error that I don't know how to deal with:


from transformers import pipeline
import datasets
import numpy as np
speech_recognizer = pipeline ("automatic-speech-recognition", model = "facebook/wav2vec2-base-960h" ,device = 0)
dataset = datasets.load_dataset("superb", name ="asr", split = "test")
files = dataset["file"]
speech_recognizer(np.array(files[:4]))

Stack Trace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_16600/437131926.py in <module>
      1 import numpy as np
      2 
----> 3 speech_recognizer(np.array(files[:4]))

~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py in __call__(self, inputs, **kwargs)
    134         assert len(inputs.shape) == 1, "We expect a single channel audio input for AutomaticSpeechRecognitionPipeline"
    135 
--> 136         processed = self.feature_extractor(
    137             inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
    138         )

~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py in __call__(self, raw_speech, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, sampling_rate, **kwargs)
    179         # zero-mean and unit-variance normalization
    180         if self.do_normalize:
--> 181             raw_speech = self.zero_mean_unit_var_norm(raw_speech)
    182 
    183         # convert into correct format for padding

~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py in zero_mean_unit_var_norm(input_values)
     84         Every array in the list is normalized to have zero mean and unit variance
     85         """
---> 86         return [(x - np.mean(x)) / np.sqrt(np.var(x) + 1e-5) for x in input_values]
     87 
     88     def __call__(

~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/transformers/models/wav2vec2/feature_extraction_wav2vec2.py in <listcomp>(.0)
     84         Every array in the list is normalized to have zero mean and unit variance
     85         """
---> 86         return [(x - np.mean(x)) / np.sqrt(np.var(x) + 1e-5) for x in input_values]
     87 
     88     def __call__(

<__array_function__ internals> in mean(*args, **kwargs)

~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims, where)
   3417             return mean(axis=axis, dtype=dtype, out=out, **kwargs)
   3418 
-> 3419     return _methods._mean(a, axis=axis, dtype=dtype,
   3420                           out=out, **kwargs)
   3421 

~/miniconda3/envs/mytextattack/lib/python3.8/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims, where)
    176             is_float16_result = True
    177 
--> 178     ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
    179     if isinstance(ret, mu.ndarray):
    180         ret = um.true_divide(

TypeError: cannot perform reduce with flexible type

I was wondering if someone could provide some insight on how to fix this?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions