Fix encoder->decoder typo bug in convert_t5x_checkpoint_to_pytorch.py by soyoung97 · Pull Request #26587 · huggingface/transformers

soyoung97 · 2023-10-04T11:11:17Z

What does this PR do?

The convert_t5x_checkpoint_to_pytorch is used to convert t5x models into pytorch models.
However, it contains a typo at line 142: The wi weights in decoder is converted to weights in encoder, and it makes the following errors when we run the script for t5 v1.0 models (where Split MLP layers is false and uses T5DenseActDense(wi) instead of T5DenseGatedActDense(wi_0, wi_1) is run:

 File "/opt/conda/envs/myenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
        Missing key(s) in state_dict: "decoder.block.0.layer.2.DenseReluDense.wi.weight", "decoder.block.1.layer.2.DenseReluDense.wi.weight", "decoder.block.2.layer.2.DenseReluDense.wi.weight", "decoder.block.3.layer.2.DenseReluDense.wi.weight", "decoder.block.4.layer.2.DenseReluDense.wi.weight", "decoder.block.5.layer.2.DenseReluDense.wi.weight", "decoder.block.6.layer.2.DenseReluDense.wi.weight", "decoder.block.7.layer.2.DenseReluDense.wi.weight", "decoder.block.8.layer.2.DenseReluDense.wi.weight", "decoder.block.9.layer.2.DenseReluDense.wi.weight", "decoder.block.10.layer.2.DenseReluDense.wi.weight", "decoder.block.11.layer.2.DenseReluDense.wi.weight".
        Unexpected key(s) in state_dict: "encoder.block.0.layer.2.DenseReluDense.wi.weight", "encoder.block.1.layer.2.DenseReluDense.wi.weight", "encoder.block.2.layer.2.DenseReluDense.wi.weight", "encoder.block.3.layer.2.DenseReluDense.wi.weight", "encoder.block.4.layer.2.DenseReluDense.wi.weight", "encoder.block.5.layer.2.DenseReluDense.wi.weight", "encoder.block.6.layer.2.DenseReluDense.wi.weight", "encoder.block.7.layer.2.DenseReluDense.wi.weight", "encoder.block.8.layer.2.DenseReluDense.wi.weight", "encoder.block.9.layer.2.DenseReluDense.wi.weight", "encoder.block.10.layer.2.DenseReluDense.wi.weight", "encoder.block.11.layer.2.DenseReluDense.wi.weight".

The following is the changed part:

            if split_mlp_wi:

                new[f"decoder.block.{i}.layer.2.DenseReluDense.wi_0.weight"] = wi[0].T

                new[f"decoder.block.{i}.layer.2.DenseReluDense.wi_1.weight"] = wi[1].T

            else:

                new[f"encoder.block.{i}.layer.2.DenseReluDense.wi.weight"] = wi.T

When changing the following line from

new[f"encoder.block.{i}.layer.2.DenseReluDense.wi.weight"] = wi.T

to

new[f"decoder.block.{i}.layer.2.DenseReluDense.wi.weight"] = wi.T

the code works fine.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Related pull request seems to be this one,
so tagging the original author @basting and the one mentioned in that PR:
@patrickvonplaten
@sanchit-gandhi
@ArthurZucker
@younesbelkada

sanchit-gandhi

Indeed it looks like this should be setting the decoder weights - thanks @soyoung97 for the fix!

HuggingFaceDocBuilderDev · 2023-10-04T15:31:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ArthurZucker

I'm a bit surprised this went unnoticed for such a long time 😅 Thanks a lot

soyoung97 · 2023-10-04T22:28:39Z

I think it went unnoticed because v1.0 model conversion is not used frequently than v1.1 models. Thanks a lot for the fast review!!

Fix bug in convert_t5x_checkpoint_to_pytorch.py

a7ce7cc

sanchit-gandhi approved these changes Oct 4, 2023

View reviewed changes

sanchit-gandhi requested a review from ArthurZucker October 4, 2023 15:20

ArthurZucker approved these changes Oct 4, 2023

View reviewed changes

ArthurZucker merged commit ca7912d into huggingface:main Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix encoder->decoder typo bug in convert_t5x_checkpoint_to_pytorch.py#26587

Fix encoder->decoder typo bug in convert_t5x_checkpoint_to_pytorch.py#26587
ArthurZucker merged 1 commit into
huggingface:mainfrom
soyoung97:soyoung97-t5x-v1.0-conversion_bug

soyoung97 commented Oct 4, 2023

Uh oh!

sanchit-gandhi left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 4, 2023

Uh oh!

ArthurZucker left a comment

Uh oh!

soyoung97 commented Oct 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

soyoung97 commented Oct 4, 2023

What does this PR do?

Before submitting

Who can review?

Uh oh!

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 4, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

soyoung97 commented Oct 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants