Skip to content

Llama AttributeError: 'bool' object has no attribute 'all_special_tokens' #1809

@mattguida

Description

@mattguida

Hi all,

while fine-tuning LLama3.1-8B-Instruct using Unsloth, I have encountered the following error:

Traceback (most recent call last):
  File "/data/gpfs/projects/punim0478/guida/mfc_fine_tuning/code/multi_label.py", line 226, in <module>
    fine_tuner.main()
  File "/data/gpfs/projects/punim0478/guida/mfc_fine_tuning/code/multi_label.py", line 106, in main
    self.model, self.tokenizer = FastLanguageModel.from_pretrained(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gpfs/projects/punim0478/guida/unsloth_env/lib/python3.11/site-packages/unsloth/models/loader.py", line 292, in from_pretrained
    model, tokenizer = dispatch_model.from_pretrained(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gpfs/projects/punim0478/guida/unsloth_env/lib/python3.11/site-packages/unsloth/models/llama.py", line 1816, in from_pretrained
    tokenizer = load_correct_tokenizer(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gpfs/projects/punim0478/guida/unsloth_env/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 557, in load_correct_tokenizer
    tokenizer = _load_correct_tokenizer(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gpfs/projects/punim0478/guida/unsloth_env/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 536, in _load_correct_tokenizer
    if assert_same_tokenization(slow_tokenizer, fast_tokenizer):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gpfs/projects/punim0478/guida/unsloth_env/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 266, in assert_same_tokenization
    all_special_tokens = list(set(special_tokens + slow_tokenizer.all_special_tokens))
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'bool' object has no attribute 'all_special_tokens'

Any lead here? Here's how I define my classes:

class MFCFineTuner:
    def __init__(self, model_name, output_dir, save_path, json_output_file, file_name, subset_size):
        self.model_name = model_name
        self.output_dir = output_dir
        self.save_path = save_path
        self.json_output_file = json_output_file
        self.file_name = file_name
        self.subset_size = subset_size

        self.max_seq_length = 1000
        self.dtype = None
        self.load_in_4bit = True
        self.system_instruction = PROMPT_MULTI
        self.alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

                                ### Instruction:
                                {}

                                ### Input:
                                Text to analyze: {}

                                ### Response:
                                {}"""

        self.model = None
        self.tokenizer = None
        self.EOS_TOKEN = None

def main(self):
        self.model, self.tokenizer = FastLanguageModel.from_pretrained(
            model_name=self.model_name,
            max_seq_length=self.max_seq_length,
            dtype=self.dtype,
            load_in_4bit=self.load_in_4bit,
            cache_dir="/data/gpfs/projects/punim0478/guida/models",
        #    device_map="auto",
        #    trust_remote_code=True
        )

        self.EOS_TOKEN = self.tokenizer.eos_token

Thanks in advance! I have tried to use both the base model (Llama3.1-8B) and the Instruct model, in both cases trying both 8bit or full.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions