fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3#26024
Conversation
|
cc @amyeroberts |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
…deepspeed zero3 (#26024)
| # Update new_num_tokens with the actual size of new_embeddings | ||
| if pad_to_multiple_of is not None: | ||
| if is_deepspeed_zero3_enabled(): | ||
| import deepspeed | ||
|
|
||
| with deepspeed.zero.GatheredParameters(new_embeddings.weight, modifier_rank=None): | ||
| new_num_tokens = new_embeddings.weight.shape[0] | ||
| else: | ||
| new_num_tokens = new_embeddings.weight.shape[0] | ||
|
|
There was a problem hiding this comment.
@kai01ai, I think this code block can be deleted and pass new_num_tokens as it is in this function scope to _get_resized_lm_head() on line 1453.
Because new_embeddings variable is created based on new_num_tokens in _get_resized_embeddings().
So it doesn't need to be reassigned from new_embeddings.weight.shape[0].
Or use new_num_tokens = new_embeddings.num_embeddings instead.
Is there any case whether new_num_tokens get changed?
By the way, is it appropriate for me to have review here?
I just wanted to add conversation but referencing code forced me to create review here.
There was a problem hiding this comment.
The new_num_tokens can be changed in _get_resized_embeddings because of the pad_to_mulitple_of, which is why we can't do this. However if we can use new_embedding.num_embeddings, with both deepspeed and not deepspeed then sure! Would you like to open a pr for this?
Totally fine for you to review like this, it's the preferred method for code discussions!
What does this PR do?
Fixes #25977
After resizing the input_embedding, the value of new_embeddings.weight.shape[0] is utilized as the new size for resizing the lm_head. However, when deepspeed zero3 is enabled, this value becomes 0. This PR addresses this issue by updating new_num_tokens explicitly.
Who can review?
@ArthurZucker, @pacman100