Skip to content

Fix encoder->decoder typo bug in convert_t5x_checkpoint_to_pytorch.py#26587

Merged
ArthurZucker merged 1 commit into
huggingface:mainfrom
soyoung97:soyoung97-t5x-v1.0-conversion_bug
Oct 4, 2023
Merged

Fix encoder->decoder typo bug in convert_t5x_checkpoint_to_pytorch.py#26587
ArthurZucker merged 1 commit into
huggingface:mainfrom
soyoung97:soyoung97-t5x-v1.0-conversion_bug

Conversation

@soyoung97

Copy link
Copy Markdown
Contributor

What does this PR do?

The convert_t5x_checkpoint_to_pytorch is used to convert t5x models into pytorch models.
However, it contains a typo at line 142: The wi weights in decoder is converted to weights in encoder, and it makes the following errors when we run the script for t5 v1.0 models (where Split MLP layers is false and uses T5DenseActDense(wi) instead of T5DenseGatedActDense(wi_0, wi_1) is run:

 File "/opt/conda/envs/myenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
        Missing key(s) in state_dict: "decoder.block.0.layer.2.DenseReluDense.wi.weight", "decoder.block.1.layer.2.DenseReluDense.wi.weight", "decoder.block.2.layer.2.DenseReluDense.wi.weight", "decoder.block.3.layer.2.DenseReluDense.wi.weight", "decoder.block.4.layer.2.DenseReluDense.wi.weight", "decoder.block.5.layer.2.DenseReluDense.wi.weight", "decoder.block.6.layer.2.DenseReluDense.wi.weight", "decoder.block.7.layer.2.DenseReluDense.wi.weight", "decoder.block.8.layer.2.DenseReluDense.wi.weight", "decoder.block.9.layer.2.DenseReluDense.wi.weight", "decoder.block.10.layer.2.DenseReluDense.wi.weight", "decoder.block.11.layer.2.DenseReluDense.wi.weight".
        Unexpected key(s) in state_dict: "encoder.block.0.layer.2.DenseReluDense.wi.weight", "encoder.block.1.layer.2.DenseReluDense.wi.weight", "encoder.block.2.layer.2.DenseReluDense.wi.weight", "encoder.block.3.layer.2.DenseReluDense.wi.weight", "encoder.block.4.layer.2.DenseReluDense.wi.weight", "encoder.block.5.layer.2.DenseReluDense.wi.weight", "encoder.block.6.layer.2.DenseReluDense.wi.weight", "encoder.block.7.layer.2.DenseReluDense.wi.weight", "encoder.block.8.layer.2.DenseReluDense.wi.weight", "encoder.block.9.layer.2.DenseReluDense.wi.weight", "encoder.block.10.layer.2.DenseReluDense.wi.weight", "encoder.block.11.layer.2.DenseReluDense.wi.weight".

The following is the changed part:

            if split_mlp_wi:

                new[f"decoder.block.{i}.layer.2.DenseReluDense.wi_0.weight"] = wi[0].T

                new[f"decoder.block.{i}.layer.2.DenseReluDense.wi_1.weight"] = wi[1].T

            else:

                new[f"encoder.block.{i}.layer.2.DenseReluDense.wi.weight"] = wi.T

When changing the following line from

new[f"encoder.block.{i}.layer.2.DenseReluDense.wi.weight"] = wi.T

to

new[f"decoder.block.{i}.layer.2.DenseReluDense.wi.weight"] = wi.T

the code works fine.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Related pull request seems to be this one,
so tagging the original author @basting and the one mentioned in that PR:
@patrickvonplaten
@sanchit-gandhi
@ArthurZucker
@younesbelkada

@sanchit-gandhi sanchit-gandhi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it looks like this should be setting the decoder weights - thanks @soyoung97 for the fix!

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised this went unnoticed for such a long time 😅 Thanks a lot

@ArthurZucker ArthurZucker merged commit ca7912d into huggingface:main Oct 4, 2023
@soyoung97

Copy link
Copy Markdown
Contributor Author

I think it went unnoticed because v1.0 model conversion is not used frequently than v1.1 models. Thanks a lot for the fast review!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants