Patch for FlanT5-XXL 8bit support by larsmennen · Pull Request #20760 · huggingface/transformers

larsmennen · 2022-12-14T02:32:18Z

What does this PR do?

Fixes #20287 .

In #20287 , 3 patches were proposed here: #20287 (comment)

Patch 3 is already covered by Add keep_in_fp32_modules support #20683
I found patch 2 is actually unnecessary, because there's already a cast to float16 here: https://github.com/younesbelkada/transformers/blob/68a894a5875bfd958b8254afd3bbb23db9c2e813/src/transformers/models/t5/modeling_t5.py#L258-L260 which also applies in this case as we keep self.wo in float32.
This PR contains the patch 1, adjusted so it only applies a cast if the hidden_states actually has a different dtype from the wo weights.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
[n/a] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@younesbelkada @sgugger

HuggingFaceDocBuilderDev · 2022-12-14T02:58:31Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada

Thanks so much for the fix @larsmennen !
I would personally advocate to focus only on T5, and we can add these patches later on if we figure out that the same issue occur for all subsidiary models! Can you revert the changes for longt5/perceiver & switch (ideally also keep the copy mechanism, so maybe add the # Copied from statements but use another model as t5 as reference (for e.g. for perceiver # Copied from transformers.src.models.longt5. ...)
Also don't forget to run the styling changes ;) (make fixup)
Thanks again!

sgugger

Thanks a lot! This looks good to me apart from the unrelated change in perceiver.

sgugger · 2022-12-14T14:11:07Z

src/transformers/models/perceiver/modeling_perceiver.py

        ...     trainable_position_encoding_kwargs=dict(
        ...         num_channels=256,
-        ...         index_dims=config.image_size**2,
+        ...         index_dims=config.image_size ** 2,


Let's leave this as is, it's not linked to this PR.

Reverted, thanks (think that somehow came with one of the make scripts, but my env may not've been fully properly setup)

larsmennen · 2022-12-15T00:00:46Z

Thanks so much for the fix @larsmennen ! I would personally advocate to focus only on T5, and we can add these patches later on if we figure out that the same issue occur for all subsidiary models! Can you revert the changes for longt5/perceiver & switch (ideally also keep the copy mechanism, so maybe add the # Copied from statements but use another model as t5 as reference (for e.g. for perceiver # Copied from transformers.src.models.longt5. ...) Also don't forget to run the styling changes ;) (make fixup) Thanks again!

That makes sense! done

… add_get_encoder_decoder_fsmt * 'main' of ssh://github.com/huggingface/transformers: (1433 commits) Add Universal Segmentation class + mapping (huggingface#20766) Stop calling expand_1d on newer TF versions (huggingface#20786) Fix object detection2 (huggingface#20798) [Pipeline] skip feature extraction test if in `IMAGE_PROCESSOR_MAPPING` (huggingface#20790) Recompile `apex` in `DeepSpeed` CI image (huggingface#20788) Move convert_to_rgb to image_transforms module (huggingface#20784) Generate: use `GenerationConfig` as the basis for `.generate()` parametrization (huggingface#20388) Install video dependency for pipeline CI (huggingface#20777) Fixing object detection with `layoutlm` (huggingface#20776) [Pipeline] fix failing bloom `pipeline` test (huggingface#20778) Patch for FlanT5-XXL 8bit support (huggingface#20760) Install vision for TF pipeline tests (huggingface#20771) Even more validation. (huggingface#20762) Add Swin backbone (huggingface#20769) Install `torch-tensorrt 1.3.0` for DeepSpeed CI (huggingface#20764) Replaces xxx_required with requires_backends (huggingface#20715) [CI-Test] Fixes but also skips the mT5 tests (huggingface#20755) Fix attribute error problem (huggingface#20765) [Tests] Improve test_attention_outputs (huggingface#20701) Fix missing `()` in some usage of `is_flaky` (huggingface#20749) ...

larsmennen changed the title ~~Workaround for #20287: FlanT5-XXL 8bit support~~ Patch for FlanT5-XXL 8bit support Dec 14, 2022

Workaround for huggingface#20287: FlanT5-XXL 8bit support

55f8fcc

larsmennen force-pushed the workaround-for-20287-flant5-8bit branch from f90b269 to 55f8fcc Compare December 14, 2022 02:45

Make fix-copies

bc8179d

younesbelkada reviewed Dec 14, 2022

View reviewed changes

sgugger approved these changes Dec 14, 2022

View reviewed changes

Lars Mennen added 2 commits December 14, 2022 15:10

revert unrelated change

9c8b8ad

Dont apply to longt5 and switch transformers

0c816ae

younesbelkada requested a review from sgugger December 15, 2022 09:20

sgugger merged commit b9b70b0 into huggingface:main Dec 15, 2022

younesbelkada mentioned this pull request Dec 22, 2022

[ T5] fix fp16 loading issue #20878

Merged

younesbelkada mentioned this pull request Jan 24, 2023

[t5] Fix T5 inference in float16 + bnb error #21281

Merged

younesbelkada mentioned this pull request May 21, 2023

junk results for int8 for Flan-xl/xxl #22568

Closed

4 tasks

Victordongy mentioned this pull request Aug 28, 2023

Running run_translation.py with mt5 model, but loss is always 0.0 #22467

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch for FlanT5-XXL 8bit support#20760

Patch for FlanT5-XXL 8bit support#20760
sgugger merged 4 commits intohuggingface:mainfrom
larsmennen:workaround-for-20287-flant5-8bit

larsmennen commented Dec 14, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Dec 14, 2022 •

edited

Loading

Uh oh!

younesbelkada left a comment •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

sgugger Dec 14, 2022

Uh oh!

larsmennen Dec 14, 2022

Uh oh!

larsmennen commented Dec 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

larsmennen commented Dec 14, 2022

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younesbelkada left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 14, 2022

Choose a reason for hiding this comment

Uh oh!

larsmennen Dec 14, 2022

Choose a reason for hiding this comment

Uh oh!

larsmennen commented Dec 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Dec 14, 2022 •

edited

Loading

younesbelkada left a comment •

edited

Loading