Skip to content

ASR pipelines won't load local Wav2Vec models with language models attached #15589

@versae

Description

@versae

Environment info

  • transformers version: 4.17.0.dev0
  • Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.34
  • Python version: 3.9.7
  • PyTorch version (GPU?): 1.10.2+cu113 (True)
  • Tensorflow version (GPU?): 2.7.0 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: N/A
  • Using distributed or parallel set-up in script?: N/A

Who can help

@patrickvonplaten, @Narsil.

Information

Model I am using: Wav2Vec2 with KenLM

The problem arises when using:

  • the official example scripts: Any script using the ASR pipeline trying to load from a local directory a Wav2Vec2 model with a language model attached, as in for example eval.py
  • my own modified scripts

The tasks I am working on is:

  • an official GLUE/SQUaD task: robust-speech-event
  • my own task or dataset

To reproduce

Steps to reproduce the behavior:

  1. Download eval.py script
  2. Clone a model repo that contains a language model
  3. Run the script with the model in a local directory
  4. It tries to download the model from the hub even though it should load locally
$ git clone https://huggingface.co/NbAiLab/wav2vec2-xls-r-1b-npsc-bokmaal-low-27k
$ cd wav2vec2-xls-r-1b-npsc-bokmaal-low-27k
$ python eval.py --model_id ./ --dataset NbAiLab/NPSC --config 16K_mp3_bokmaal --split test --log_outputs
Reusing dataset npsc (/home/user/.cache/huggingface/datasets/NbAiLab___npsc/16K_mp3_bokmaal/1.0.0/fab8b0517ebc9c0c6f0d019094e8816d5537f55d965f2dd90750349017b0bc69)
Traceback (most recent call last):
  File "/home/user/wav2vec2-xls-r-1b-npsc-bokmaal-low-27k/eval.py", line 151, in <module>
    main(args)
  File "/home/user/wav2vec2-xls-r-1b-npsc-bokmaal-low-27k/eval.py", line 98, in main
    asr = pipeline("automatic-speech-recognition", model=args.model_id, device=args.device)
  File "/home/user/audio/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 628, in pipeline
    decoder = BeamSearchDecoderCTC.load_from_hf_hub(model_name, allow_regex=allow_regex)
  File "/home/user/audio/lib/python3.9/site-packages/pyctcdecode/decoder.py", line 771, in load_from_hf_hub
    cached_directory = snapshot_download(model_id, cache_dir=cache_dir, **kwargs)
  File "/home/user/audio/lib/python3.9/site-packages/huggingface_hub/snapshot_download.py", line 144, in snapshot_download
    model_info = _api.model_info(repo_id=repo_id, revision=revision, token=token)
  File "/home/user/audio/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 912, in model_info
    r.raise_for_status()
  File "/home/user/audio/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models//revision/main 

Expected behavior

It should not try to download anything when the model is a path to a local directory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions