I am looking to LoRA-finetune models like Gemma, which have tied embeddings.
But, I would also like to have the shared embeddings as trainable (the common embedding table corresponding to both input and output embeddings of the network).
How do I achieve this?
Note: Passing both ["embed_tokens","lm_head"] to modules_to_save will result in untying them, because PEFT will create separate tensor copies. Passing only ["embed_tokens"] will result in only the input embeddings trainable (by making a separate PEFT copy), while the output embeddings being as it is (the original tensor).
I am looking to LoRA-finetune models like Gemma, which have tied embeddings.
But, I would also like to have the shared embeddings as trainable (the common embedding table corresponding to both input and output embeddings of the network).
How do I achieve this?
Note: Passing both
["embed_tokens","lm_head"]tomodules_to_savewill result in untying them, because PEFT will create separate tensor copies. Passing only["embed_tokens"]will result in only the input embeddings trainable (by making a separate PEFT copy), while the output embeddings being as it is (the original tensor).