fix/load-checkpoint-add-new-tokens #1225
Conversation
* Llama 3.1 * Update _utils.py * Llama 3.1 * Update _utils.py * Update llama.py * Update llama.py * hack for rotary * patch RoPE * refix rope * Update _utils.py * Update llama.py * Llama 3.1 check * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py
* Unsloth Zoo * Update trainer.py * Update trainer.py * Update cross_entropy_loss.py * n_items * Update llama.py * kwargs * Remove extraneous f prefixes (unslothai#1133) Co-authored-by: Emil Sadek <esadek@users.noreply.github.com> * Update __init__.py * kwargs * Update trainer.py * Update trainer.py * Update trainer.py * Fix GA * Update _utils.py * Update llama.py * Update tokenizer_utils.py * Warn on old versions * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py --------- Co-authored-by: Emil Sadek <esadek@hotmail.com> Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>
Currently, Unsloth doesn't pass additional parameters to Trainer.compute_loss such as return_outputs. This leads to errors when calling trainer.evaluate(). This change fixes the bug by properly passing parameters to Trainer.compute_loss.
* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
* fix: correct tokenizer handling in patch_sft_trainer_tokenizer * Revert "fix: correct tokenizer handling in patch_sft_trainer_tokenizer" This reverts commit f18ac21. * fix: correct condition for test_text assignment in patch_sft_trainer_tokenizer
* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (unslothai#1180) * Fix DPO, ORPO (unslothai#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
* Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (unslothai#1180) * Fix DPO, ORPO (unslothai#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (unslothai#1165) * chore: update chat_templates.py (unslothai#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (unslothai#1186) * Cleanup upcast logs (unslothai#1188) * Fix/phi-longrope (unslothai#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
* Bring back float32 if float16 instead of bfloat16 * Refactor mixed precision handling for lm_head and embed_tokens to ensure correct dtype usage * Fix dtype retrieval for embed_tokens and lm_head in mixed precision training * Fix dtype retrieval for embed_tokens and lm_head to use weight dtype in mixed precision training * Fix dtype handling for embed_tokens and lm_head to ensure correct float32 usage in mixed precision training * Fix dtype assignment for lm_head modules to ensure correct weight dtype usage in mixed precision training
|
I need a discussion about the embedding tho since I did not implement specification to specify the method to extend the embedding. So for example, when training the embedding, the user specify to use Maybe we can store additional params in the |
|
Also while here, seems like the value of |
|
https://colab.research.google.com/drive/1xBxY_L48Lzu5SJjukPExgoWVthoyTGCA?usp=sharing reproducible of this fix |
#1215
Given this issue where we can't immediately use the changed vocab size because the difference size between the adapter and base model, we need to resize the base model before merging the LoRA into base model.
Note this need changes to the
unsloth-zoosince we need a modification of it. which I also create a PR of itunslothai/unsloth-zoo#9