| False |
False |
modules_to_save=[embed_tokens] / modules_to_save=[lm_head] |
ModulesToSaveWrapper on embedding/lm head |
| False |
True |
modules_to_save=[embed_tokens] / modules_to_save=[lm_head] |
warn & ModulesToSaveWrapper on embedding/lm head |
| True |
False |
modules_to_save=[embed_tokens] / modules_to_save=[lm_head] |
ModulesToSaveWrapper on embedding/lm head (BC) |
| True |
True |
modules_to_save=[embed_tokens] / modules_to_save=[lm_head] |
ModulesToSaveWrappers share weights |
| False |
False |
modules_to_save=[embed_tokens, lm_head] |
treat as separate |
| False |
True |
modules_to_save=[embed_tokens, lm_head] |
warn & treat as separate |
| True |
False |
modules_to_save=[embed_tokens, lm_head] |
treat as separate (BC) |
| True |
True |
modules_to_save=[embed_tokens, lm_head] |
ModulesToSaveWrappers share weights |
| False |
False |
target_modules=[embed_tokens] / target_modules=[lm_head] |
LoRA on embedding/lm head |
| False |
True |
target_modules=[embed_tokens] / target_modules=[lm_head] |
*warn & LoRA on embedding/lm head |
| True |
False |
target_modules=[embed_tokens] / target_modules=[lm_head] |
LoRA on embedding/lm head (BC) |
| True |
True |
target_modules=[embed_tokens] / target_modules=[lm_head] |
*LoRA share weights |
| False |
False |
target_modules=[embed_tokens, lm_head] |
treat as separate |
| False |
True |
target_modules=[embed_tokens, lm_head] |
*warn & treat as separate |
| True |
False |
target_modules=[embed_tokens, lm_head] |
treat as separate (BC) |
| True |
True |
target_modules=[embed_tokens, lm_head] |
*LoRA share weights |
| False |
False |
trainable_token_indices=[1, 2, 3] |
trainable tokens on embeddings only |
| False |
True |
trainable_token_indices=[1, 2, 3] |
warn & trainable tokens on embeddings only |
| True |
False |
trainable_token_indices=[1, 2, 3] |
tied trainable tokens |
| True |
True |
trainable_token_indices=[1, 2, 3] |
tied trainable tokens |
| False |
False |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]} |
treat as separate |
| False |
True |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]} |
warn & treat as separate |
| True |
False |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]} |
tied trainable tokens |
| True |
True |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]} |
tied trainable tokens |
| False |
False |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]} |
treat as separate |
| False |
True |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]} |
warn & treat as separate |
| True |
False |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]} |
*treat as separate |
| True |
True |
trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]} |
*error |
Currently, the way PEFT deals with tied embedding and LM head weights is not always clear. In #2803, a new argument,
ensure_weight_tying, was introduced to make it easier for users to automatically tie the PEFT weights while keeping backwards compatibility. However, this makes it even more important to clarify what happens when.The table below shows the intended behavior in different circumstances. Notably, weigh tying can effect
modules_to_save,target_modules, andtrainable_token_indices. The table lists the expected results for all combinations of these factors.modules_to_save=[embed_tokens]/modules_to_save=[lm_head]ModulesToSaveWrapperon embedding/lm headmodules_to_save=[embed_tokens]/modules_to_save=[lm_head]ModulesToSaveWrapperon embedding/lm headmodules_to_save=[embed_tokens]/modules_to_save=[lm_head]ModulesToSaveWrapperon embedding/lm head (BC)modules_to_save=[embed_tokens]/modules_to_save=[lm_head]ModulesToSaveWrappers share weightsmodules_to_save=[embed_tokens, lm_head]modules_to_save=[embed_tokens, lm_head]modules_to_save=[embed_tokens, lm_head]modules_to_save=[embed_tokens, lm_head]ModulesToSaveWrappers share weightstarget_modules=[embed_tokens]/target_modules=[lm_head]target_modules=[embed_tokens]/target_modules=[lm_head]target_modules=[embed_tokens]/target_modules=[lm_head]target_modules=[embed_tokens]/target_modules=[lm_head]target_modules=[embed_tokens, lm_head]target_modules=[embed_tokens, lm_head]target_modules=[embed_tokens, lm_head]target_modules=[embed_tokens, lm_head]trainable_token_indices=[1, 2, 3]trainable_token_indices=[1, 2, 3]trainable_token_indices=[1, 2, 3]trainable_token_indices=[1, 2, 3]trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]}trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]}trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]}trainable_token_indices={"lm_head": [1,2], "embed_tokens": [1,2]}trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]}trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]}trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]}trainable_token_indices={"lm_head": [1,2], "embed_tokens": [3,4]}Explanation:
trainable_token_indices, we distinguish between cases where embedding and LM head define the same indices, which would allow weight sharing, and where they define distinct indices, which precludes weight sharing.Ping @romitjain