Skip to content

AttributeError: 'CodeGen25Tokenizer' object has no attribute 'encoder' #82

@velocityCavalry

Description

@velocityCavalry

Hi! I am using transformers 4.34 and tiktoken 0.4.0. I am trying to download the tokenizer for CodeGen 2.5, but when I run the command in the tutorial

>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono", trust_remote_code=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 738, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2045, in from_pretrained
    return cls._from_pretrained(
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 136, in __init__
    super().__init__(
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 366, in __init__
    self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
  File "/home/velocity/miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 462, in _add_tokens
    current_vocab = self.get_vocab().copy()
  File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 153, in get_vocab
    vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
  File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 149, in vocab_size
    return self.encoder.n_vocab
AttributeError: 'CodeGen25Tokenizer' object has no attribute 'encoder'

I tried to delete the cache but it doesn't seem to be working.. Running tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono") gives ValueError: Tokenizer class CodeGen25Tokenizer does not exist or is not currently imported.

So I wonder whether anyone else has encountered this issue, and if yes, how can I solve it, thank you so much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions