[Bug] Llama_3.2_1B_Conversational: Seeing multiple trailing <|reserved_special_token_xxx|> at inference time

**Describe the bug**
I'm following the steps in the https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb notebook to fine tune Llama 3.2 1B model to generate custom Python functions from my codebase. During inference (Colab itself, and also when I convert to Ollama GGUF), the generated output always has a few reserved tokens in the end. For example: `<|reserved_special_token_193|><|reserved_special_token_87|>`.

Earlier, when I just followed the steps in the example Colab notebook, it resulted in continuous generation. But later when I added an explicit step to append `EOS_TOKEN` I'm not seeing continuous generation, but it still generates random reserved tokens. This is the only step that is different from the example Colab notebook:

```
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

print(f"Using EOS token: {tokenizer.eos_token} with id: {tokenizer.eos_token_id}")
print(f"Using PAD token: {tokenizer.pad_token} with id: {tokenizer.pad_token_id}")

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = []
    for convo in convos:
        text = tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
        # Manually ensure EOS token is present
        if not text.endswith(tokenizer.eos_token):
            text += tokenizer.eos_token
        texts.append(text)
    return {"text": texts}

dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)
```

I have also confirmed that `EOS_TOKEN` and `PAD_TOKEN` are not the same.

Here's what I get when I print the dataset after applying the `formatting_prompts_func` to the conversations (which appears to be in line with the Colab example):
````
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

--- Prompt example instruction ---

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```python
--- expected sample python code ---
```
<|eot_id|><|end_of_text|>
````

Any ideas what might be going on? As a last resort, I had to write a clean-up function that removes these tokens from the generated response.

1. **Environment Setup:**
   - Colab (copy of https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) with custom training dataset.

2. **Dataset Details:**
Custom private dataset that follows the following pattern:
`[{'content': '---sample prompt to generate a custom python function given some inputs---', 'role': 'user'}, {'content': '```python---sample code snippet for training---```\n', 'role': 'assistant'}]`

3. **Model Details:**
`unsloth/Llama-3.2-1B`

4. **Training Configuration:**
```
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        #num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 500,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

# Start training
trainer_stats = trainer.train()
```

5. **Expected Behavior:**
I would expected the inference to just generate the desired Python output without any trailing reserved tokens.
   
7. **Actual Behavior:**
Unwanted trailing reserved tokens, leading to junk output after actual code snippet.

8. **Additional notes:**
None.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Llama_3.2_1B_Conversational: Seeing multiple trailing <|reserved_special_token_xxx|> at inference time #2360

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Llama_3.2_1B_Conversational: Seeing multiple trailing <|reserved_special_token_xxx|> at inference time #2360

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions