System Info
transformers version: 4.46.3
- Platform: Linux-6.6.20-aufs-1-x86_64-with-glibc2.36
- Python version: 3.11.2
- Huggingface_hub version: 0.26.1
- Safetensors version: 0.4.5
- Accelerate version: 1.0.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA A10
Who can help?
@ArthurZucker
Information
Tasks
Reproduction
This code reproduces the problem:
pythia = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-410m")
pythia.resize_token_embeddings(502)
pythia.post_init()
the default value for tie_word_embeddings in pythia is False.
I believe the problem arises from the fact the if tie_word_embeddings is False, Then resize_token_embeddings creates a new nn.Linear object that doesn't have the flag _is_hf_initialized(causing it to be False when using getattr), and then post_init calls _init_weights on the new module.
Expected behavior
post_init should not change the weights of output_embeddings after a resize.
System Info
transformersversion: 4.46.3Who can help?
@ArthurZucker
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
This code reproduces the problem:
the default value for
tie_word_embeddingsin pythia is False.I believe the problem arises from the fact the if
tie_word_embeddingsis False, Thenresize_token_embeddingscreates a newnn.Linearobject that doesn't have the flag_is_hf_initialized(causing it to beFalsewhen usinggetattr), and thenpost_initcalls_init_weightson the new module.transformers/src/transformers/modeling_utils.py
Line 2406 in c8c8dff
Expected behavior
post_initshould not change the weights of output_embeddings after a resize.