Skip to content

wav2vec processor batching logic is too restrictive #22175

@LWprogramming

Description

@LWprogramming

System Info

transformers version at the time of writing is 4.26.1

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

# !pip install transformers torch # in jupyter notebook
from transformers import Wav2Vec2Processor
import torch
import numpy as np

batch = 4

# create Wav2Vec2Processor
processor = Wav2Vec2Processor.from_pretrained("facebook/hubert-large-ls960-ft")
# generate random input tensor
input_tensor = torch.tensor(np.random.randn(batch, 10, 10))
# pass input tensor through processor
output = processor(input_tensor, return_tensors="pt")
print(output["input_values"].shape) # 1 x 4 x 10 x 10

Expected behavior

It seems reasonable that an input could be of shape batch x d_1 x d_2 ... and I'd expect the output to have the same shape. However, here the code has an extra check for type list or tuple that results in it misinterpreting the input as a single example.

Side note: I'm unsure what to infer from the type checking logic because it doesn't match the type hints i.e. tuple isn't supposed to be possible here anyways, according to the __call__ type hint. I did check some other examples of is_batched appearing in the src/transformers/models directory and they look similar but unexpected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions