fix/load-checkpoint-add-new-tokens by Erland366 · Pull Request #1225 · unslothai/unsloth

Erland366 · 2024-10-31T12:42:33Z

Given this issue where we can't immediately use the changed vocab size because the difference size between the adapter and base model, we need to resize the base model before merging the LoRA into base model.

Note this need changes to the unsloth-zoo since we need a modification of it. which I also create a PR of it

unslothai/unsloth-zoo#9

* Llama 3.1 * Update _utils.py * Llama 3.1 * Update _utils.py * Update llama.py * Update llama.py * hack for rotary * patch RoPE * refix rope * Update _utils.py * Update llama.py * Llama 3.1 check * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py

* Unsloth Zoo * Update trainer.py * Update trainer.py * Update cross_entropy_loss.py * n_items * Update llama.py * kwargs * Remove extraneous f prefixes (unslothai#1133) Co-authored-by: Emil Sadek <esadek@users.noreply.github.com> * Update __init__.py * kwargs * Update trainer.py * Update trainer.py * Update trainer.py * Fix GA * Update _utils.py * Update llama.py * Update tokenizer_utils.py * Warn on old versions * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py --------- Co-authored-by: Emil Sadek <esadek@hotmail.com> Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>

Currently, Unsloth doesn't pass additional parameters to Trainer.compute_loss such as return_outputs. This leads to errors when calling trainer.evaluate(). This change fixes the bug by properly passing parameters to Trainer.compute_loss.

* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* fix: correct tokenizer handling in patch_sft_trainer_tokenizer * Revert "fix: correct tokenizer handling in patch_sft_trainer_tokenizer" This reverts commit f18ac21. * fix: correct condition for test_text assignment in patch_sft_trainer_tokenizer

* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (unslothai#1180) * Fix DPO, ORPO (unslothai#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>

* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (unslothai#1180) * Fix DPO, ORPO (unslothai#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (unslothai#1186) * Cleanup upcast logs (unslothai#1188) * Fix/phi-longrope (unslothai#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>

* Bring back float32 if float16 instead of bfloat16 * Refactor mixed precision handling for lm_head and embed_tokens to ensure correct dtype usage * Fix dtype retrieval for embed_tokens and lm_head in mixed precision training * Fix dtype retrieval for embed_tokens and lm_head to use weight dtype in mixed precision training * Fix dtype handling for embed_tokens and lm_head to ensure correct float32 usage in mixed precision training * Fix dtype assignment for lm_head modules to ensure correct weight dtype usage in mixed precision training

Erland366 · 2024-10-31T12:46:41Z

I need a discussion about the embedding tho since I did not implement specification to specify the method to extend the embedding. So for example, when training the embedding, the user specify to use interpolation. Then when we load the checkpoint and resize the base model again, we need to make sure that the resize method is the same as in training.

Maybe we can store additional params in the model.config of the method? then we can pass it when we load the checkpoint and resize?

Erland366 · 2024-10-31T12:51:35Z

Also while here, seems like the value of tokenizer.vocab_size is unchanged when we do add_new_tokens. Is tokenizer.vocab_size only consider non special tokens and since we add all of the new tokens to the special tokens, that's why the attribute value is not increasing?

Erland366 · 2024-10-31T13:56:49Z

https://colab.research.google.com/drive/1xBxY_L48Lzu5SJjukPExgoWVthoyTGCA?usp=sharing

reproducible of this fix

danielhanchen added 30 commits July 23, 2024 10:48

patch RoPE

4a46220

refix rope

2d9f189

Update _utils.py

80d62c3

Update llama.py

7d7a5f7

Llama 3.1 check

2f9bd5b

Update llama.py

740979b

Update llama.py

47d230b

Update llama.py

f849b8b

Update llama.py

6157cef

Update llama.py

5da00a9

Update llama.py

2ff7d83

Update llama.py

7c441f3

Update llama.py

5d92456

Update llama.py

4a3fddd

Update llama.py

ca3a1b7

Update llama.py

b93a757

Update README.md

22968a2

Update README.md

824511e

Update loader.py

7774539

Update _utils.py

caa4028

Update llama.py

4dd4ad2

Update llama.py

cc11b78

Create Run.png

d1f3b6c

Update README.md

a96d16e

Merge branch 'main' into nightly

ddd4e86

Mistral

bd180c1

Patch PEFT

6e30a7a

Fix PEFT

08d3ef4

Update llama.py

66e0453

danielhanchen and others added 21 commits October 17, 2024 20:43

Update README.md

eb533db

Update README.md

b3e85e9

Update _utils.py

3085f4c

fix: compute_loss bug (unslothai#1151)

12bdd86

Currently, Unsloth doesn't pass additional parameters to Trainer.compute_loss such as return_outputs. This leads to errors when calling trainer.evaluate(). This change fixes the bug by properly passing parameters to Trainer.compute_loss.

Fix get_token

d7850d8

Update _utils.py

9327d90

Update save.py

1f52468

Update _utils.py

1e7e0e2

Update _utils.py

9ca13b8

Torch 2.5

8d46c0d

Update pyproject.toml

49ae619

Update _utils.py

007efc2

Merge branch 'main' of https://github.com/unslothai/unsloth

a2f8db3

Add functionality to update model vocabulary with new tokenizer tokens

bcaa5b0

Erland366 mentioned this pull request Oct 31, 2024

feat-resize-tokenizer-add-new-tokens unslothai/unsloth-zoo#9

Open

Erland366 changed the title ~~Add functionality to update model vocabulary with new tokenizer tokens~~ fix/load-checkpoint-add-new-tokens Oct 31, 2024

danielhanchen closed this Mar 12, 2026

danielhanchen force-pushed the main branch from 997f1a7 to 0099fff Compare March 12, 2026 05:34

danielhanchen mentioned this pull request Mar 12, 2026

fix/load-checkpoint-add-new-tokens #4219

Open

danielhanchen mentioned this pull request Apr 22, 2026

Fix tokenizer save gemma #5115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix/load-checkpoint-add-new-tokens #1225

fix/load-checkpoint-add-new-tokens #1225
Erland366 wants to merge 557 commits into
unslothai:mainfrom
Erland366:fix/load-checkpoint-add-new-tokens

Erland366 commented Oct 31, 2024

Uh oh!

Erland366 commented Oct 31, 2024

Uh oh!

Erland366 commented Oct 31, 2024

Uh oh!

Erland366 commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

Conversation

Erland366 commented Oct 31, 2024

Uh oh!

Erland366 commented Oct 31, 2024

Uh oh!

Erland366 commented Oct 31, 2024

Uh oh!

Erland366 commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants