Skip to content

MegatronBertForMaskedLM #16638

@kaushalshetty

Description

@kaushalshetty

Environment info

  • transformers version: 4.12.5
  • Platform: Linux
  • Python version: 3.6
  • PyTorch version (GPU?): 1.10.0+cu102
  • Tensorflow version (GPU?):2.6
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik @stas00

Information

Model I am using MegatronBERT from https://huggingface.co/nvidia/megatron-bert-uncased-345m. The model loads correctly in the following way :

from transformers import BertTokenizer, MegatronBertModel
model = MegatronBertModel.from_pretrained("megatron_model_here")

but throws a RuntimeError for size mismatch while using MegatronBertForMaskedLM

To reproduce

model = MegatronBertForMaskedLM.from_pretrained("megatron_model_here")

Error :

RuntimeError: Error(s) in loading state_dict for MegatronBertForMaskedLM:
	size mismatch for cls.predictions.bias: copying a param with shape torch.Size([30592]) from checkpoint, the shape in current model is torch.Size([29056])

Expected behavior

MaskedLM model loads properly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions