Fail to load checkpoints trained with extended tokenizer

As discussd in issue [https://github.com/unslothai/unsloth/issues/154#issue-2119969174](url) , I am also working with extended tokenizer to accomodate words of a new language. I've merged Llama 3.2 tokenizer with my tokenizer and the size was increased to 146,452 (as opposed to 128,256, which is the size of the original Llama3.2 tokenizer). I am running a continual pretraining, and saving checkpoints at a certain number of steps. I want to finetune the checkpoints further with instructional dataset to track their performances. However, I am not able to load the checkpoints due to the mismatch in tokenizer size of the base model and the adapter. I read about the suggested solution: to merge and save the checkpoints. However, since unlsoth is automatically saving the checkpoints, I don't have the chance to do that without first loading the models. So, what should I do? Any suggestion is appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fail to load checkpoints trained with extended tokenizer #1215

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Fail to load checkpoints trained with extended tokenizer #1215

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions