-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Description
System Info
transformersversion: 4.34.1- Platform: Linux-5.4.0-149-generic-x86_64-with-glibc2.31
- Python version: 3.10.11
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.0
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
It can be reproduced on python console.
>>> import transformers
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("google/flan-t5-base")
Downloading (…)okenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.54k/2.54k [00:00<00:00, 25.3MB/s]
Downloading spiece.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 20.2MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 2.47MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 15.9MB/s]
>>> tokenizer.add_special_tokens({"additional_special_tokens": ["<1>", "<2>"]})
2
>>> tokenizer.save_pretrained("/tmp/tokenizer")
('/tmp/tokenizer/tokenizer_config.json', '/tmp/tokenizer/special_tokens_map.json', '/tmp/tokenizer/tokenizer.json')
>>> new_tokenizer = transformers.AutoTokenizer.from_pretrained("/tmp/tokenizer")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 751, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
return cls._from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5_fast.py", line 127, in __init__
raise ValueError(
ValueError: Both extra_ids (100) and additional_special_tokens (['<1>', '<2>']) are provided to T5Tokenizer. In this case the additional_special_tokens must include the extra_ids tokens
>>> transformers.__version__
'4.34.1'Expected behavior
It works well on version 4.33.x
>>> import transformers
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("google/flan-t5-base")
>>> tokenizer.add_special_tokens({"additional_special_tokens": ["<1>", "<2>"]})
2
>>> tokenizer.save_pretrained("/tmp/tokenizer")
('/tmp/tokenizer/tokenizer_config.json', '/tmp/tokenizer/special_tokens_map.json', '/tmp/tokenizer/tokenizer.json')
>>> new_tokenizer = transformers.AutoTokenizer.from_pretrained("/tmp/tokenizer")
>>> transformers.__version__
'4.33.3'I found related issue (#26536).
Maybe this is T5-specific issue, because it worked well using different models like bert-base-cased or gpt2
Thank you in advance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels