Skip to content

resizing token embeddings causes output embedding to be reinitialized in post_init when tie_word_embedding is False #35141

@avishaiElmakies

Description

@avishaiElmakies

System Info

  • transformers version: 4.46.3
  • Platform: Linux-6.6.20-aufs-1-x86_64-with-glibc2.36
  • Python version: 3.11.2
  • Huggingface_hub version: 0.26.1
  • Safetensors version: 0.4.5
  • Accelerate version: 1.0.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA A10

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This code reproduces the problem:

pythia = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-410m")
pythia.resize_token_embeddings(502)
pythia.post_init()

the default value for tie_word_embeddings in pythia is False.
I believe the problem arises from the fact the if tie_word_embeddings is False, Then resize_token_embeddings creates a new nn.Linear object that doesn't have the flag _is_hf_initialized(causing it to be False when using getattr), and then post_init calls _init_weights on the new module.

new_lm_head = nn.Linear(

Expected behavior

post_init should not change the weights of output_embeddings after a resize.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Good Second IssueIssues that are more difficult to do than "Good First" issues - give it a try if you want!bugfixme

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions