Skip to content

fix/load-checkpoint-add-new-tokens #1225

Closed
Erland366 wants to merge 557 commits into
unslothai:mainfrom
Erland366:fix/load-checkpoint-add-new-tokens
Closed

fix/load-checkpoint-add-new-tokens #1225
Erland366 wants to merge 557 commits into
unslothai:mainfrom
Erland366:fix/load-checkpoint-add-new-tokens

Conversation

@Erland366

Copy link
Copy Markdown
Collaborator

#1215

Given this issue where we can't immediately use the changed vocab size because the difference size between the adapter and base model, we need to resize the base model before merging the LoRA into base model.

Note this need changes to the unsloth-zoo since we need a modification of it. which I also create a PR of it

unslothai/unsloth-zoo#9

danielhanchen and others added 21 commits October 17, 2024 20:43
* Unsloth Zoo

* Update trainer.py

* Update trainer.py

* Update cross_entropy_loss.py

* n_items

* Update llama.py

* kwargs

* Remove extraneous f prefixes (unslothai#1133)

Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>

* Update __init__.py

* kwargs

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Fix GA

* Update _utils.py

* Update llama.py

* Update tokenizer_utils.py

* Warn on old versions

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

---------

Co-authored-by: Emil Sadek <esadek@hotmail.com>
Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>
Currently, Unsloth doesn't pass additional parameters to Trainer.compute_loss such as return_outputs. This leads to errors when calling trainer.evaluate(). This change fixes the bug by properly passing parameters to Trainer.compute_loss.
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (unslothai#1165)

* chore: update chat_templates.py (unslothai#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
* fix: correct tokenizer handling in patch_sft_trainer_tokenizer

* Revert "fix: correct tokenizer handling in patch_sft_trainer_tokenizer"

This reverts commit f18ac21.

* fix: correct condition for test_text assignment in patch_sft_trainer_tokenizer
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (unslothai#1165)

* chore: update chat_templates.py (unslothai#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (unslothai#1165)

* chore: update chat_templates.py (unslothai#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (unslothai#1180)

* Fix DPO, ORPO (unslothai#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (unslothai#1165)

* chore: update chat_templates.py (unslothai#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (unslothai#1165)

* chore: update chat_templates.py (unslothai#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (unslothai#1180)

* Fix DPO, ORPO (unslothai#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (unslothai#1165)

* chore: update chat_templates.py (unslothai#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (unslothai#1186)

* Cleanup upcast logs (unslothai#1188)

* Fix/phi-longrope (unslothai#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
* Bring back float32 if float16 instead of bfloat16

* Refactor mixed precision handling for lm_head and embed_tokens to ensure correct dtype usage

* Fix dtype retrieval for embed_tokens and lm_head in mixed precision training

* Fix dtype retrieval for embed_tokens and lm_head to use weight dtype in mixed precision training

* Fix dtype handling for embed_tokens and lm_head to ensure correct float32 usage in mixed precision training

* Fix dtype assignment for lm_head modules to ensure correct weight dtype usage in mixed precision training
@Erland366

Copy link
Copy Markdown
Collaborator Author

I need a discussion about the embedding tho since I did not implement specification to specify the method to extend the embedding. So for example, when training the embedding, the user specify to use interpolation. Then when we load the checkpoint and resize the base model again, we need to make sure that the resize method is the same as in training.

Maybe we can store additional params in the model.config of the method? then we can pass it when we load the checkpoint and resize?

@Erland366 Erland366 changed the title Add functionality to update model vocabulary with new tokenizer tokens fix/load-checkpoint-add-new-tokens Oct 31, 2024
@Erland366

Copy link
Copy Markdown
Collaborator Author

Also while here, seems like the value of tokenizer.vocab_size is unchanged when we do add_new_tokens. Is tokenizer.vocab_size only consider non special tokens and since we add all of the new tokens to the special tokens, that's why the attribute value is not increasing?

@Erland366

Copy link
Copy Markdown
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants