Skip to content

add-new-model-like fails if model is inside the TOKENIZER_MAPPING_NAMES #44661

@michalrzak

Description

@michalrzak

System Info

  • transformers version: 5.3.0.dev0
  • Platform: Linux-6.14.0-1013-nvidia-aarch64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 1.6.0
  • Safetensors version: 0.7.0
  • Accelerate version: 1.14.0.dev0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.9.0+cu130 (CUDA)
  • Using distributed or parallel set-up in script?: NA
  • Using GPU in script?: NA
  • GPU type: NVIDIA GB10

Who can help?

@ydshieh

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Run transformers add-new-model-like
  2. Select a model which is inside TOKENIZER_MAPPING_NAMES such as qwen2_5_vl
  3. Provide the new model name and fully cased name

Error:

root@51e8d26e057b:/# transformers add-new-model-like
What model would you like to duplicate? Please provide it as lowercase, e.g. `llama`): qwen2_5_vl
What is the new model name? Please provide it as snake lowercase, e.g. `new_model`? test_model
What is the fully cased name you would like to appear in the doc (e.g. `NeW ModEl`)?  [TestModel] 
Traceback (most recent call last):
  File "/usr/local/bin/transformers", line 33, in <module>
    sys.exit(load_entry_point('transformers', 'console_scripts', 'transformers')())
  File "/transformers/src/transformers/cli/transformers.py", line 39, in main
    app()
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 1152, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 1135, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1485, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 795, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 188, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1873, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1269, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 824, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 1514, in wrapper
    return callback(**use_params)
  File "/transformers/src/transformers/cli/add_new_model_like.py", line 104, in add_new_model_like
    ) = get_user_input()
  File "/transformers/src/transformers/cli/add_new_model_like.py", line 719, in get_user_input
    if old_model_infos.tokenizer_class is not None:
AttributeError: 'ModelInfos' object has no attribute 'tokenizer_class'. Did you mean: 'fast_tokenizer_class'?

The issue is due to the following code, which in the case self.lowercase_name in TOKENIZER_MAPPING_NAMES == True does not set self.tokenizer_class. self.tokenizer_class is later accessed inside of add-new-model-like

if self.lowercase_name in TOKENIZER_MAPPING_NAMES:
self.fast_tokenizer_class = TOKENIZER_MAPPING_NAMES[self.lowercase_name]
self.fast_tokenizer_class = (
None if self.fast_tokenizer_class == "PreTrainedTokenizerFast" else self.fast_tokenizer_class
)
else:
self.tokenizer_class, self.fast_tokenizer_class = None, None

This was changed in #40936, which removed setting self.tokenizer_class.

Happy to make a PR, fixing this but not sure what is the expected behaviour:

  1. setting the self.tokenizer_class as before rm slow tokenizers #40936
  2. setting self.tokenizer_class to None
  3. adapting add-new-model-like

Expected behavior

The script runs successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions