Skip to content

AutoProcessor.from_pretrained not passing all kwargs to cached_file #44704

@peacefulotter

Description

@peacefulotter

Hi,

I believe AutoProcessor.from_pretrained is not forwarding arguments correctly to cached_file.

The cached_file function is defined with **kwargs. However, AutoProcessor.from_pretrained filters the provided kwargs using inspect.signature(cached_file).parameters (see here). Since the parameters are inferred from the function signature, this effectively prevents any additional kwargs from being passed through to cached_file.

One possible fix would be to update the cached_file function signature to explicitly list its supported parameters instead of relying on **kwargs, i.e. from this:

def cached_file(
    path_or_repo_id: str | os.PathLike,
    filename: str,
    **kwargs,
) -> str | None:

to this:

def cached_file(
    path_or_repo_id: Union[str, os.PathLike],
    filename: str,
    cache_dir: Optional[Union[str, os.PathLike]] = None,
    force_download: bool = False,
    proxies: Optional[Dict[str, str]] = None,
    token: Optional[Union[str, bool]] = None,
    revision: str = "main",
    local_files_only: bool = False,
    subfolder: str = "",
    repo_type: Optional[str] = None,
) -> str | None:

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

AutoProcessor.from_pretrained(force_download=True), force_download isn't passed to cached_file

Expected behavior

AutoProcessor.from_pretrained(force_download=True), force_download (and other kwargs) are passed to cached_file

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions