-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Description
Environment info
transformersversion: 4.12.5- Platform: Linux
- Python version: 3.6
- PyTorch version (GPU?): 1.10.0+cu102
- Tensorflow version (GPU?):2.6
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using MegatronBERT from https://huggingface.co/nvidia/megatron-bert-uncased-345m. The model loads correctly in the following way :
from transformers import BertTokenizer, MegatronBertModel
model = MegatronBertModel.from_pretrained("megatron_model_here")
but throws a RuntimeError for size mismatch while using MegatronBertForMaskedLM
To reproduce
model = MegatronBertForMaskedLM.from_pretrained("megatron_model_here")
Error :
RuntimeError: Error(s) in loading state_dict for MegatronBertForMaskedLM:
size mismatch for cls.predictions.bias: copying a param with shape torch.Size([30592]) from checkpoint, the shape in current model is torch.Size([29056])
Expected behavior
MaskedLM model loads properly
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels