[`core`/ `gradient_checkpointing`] Refactor GC - part 2 by younesbelkada · Pull Request #27073 · huggingface/transformers

younesbelkada · 2023-10-25T18:27:33Z

What does this PR do?

Extends #27020 by further simplifying the GC enable / disable mechanism. We can simply iterate over all submodules of the PreTrainedModel and check for the attribute gradient_checkpointing.

Some models had supports_gradient_checkpointing attribute set to True whereas they actually don't. So this PR fixes that as well.

Some models were also calling torch.utils.checkpointing.checkpoint instead of self.gradient_checkpointing_func, this PR fixes it.

Also gradient_checkpointing is now private to avoid exposing it as a public attribute

cc @ArthurZucker

younesbelkada · 2023-10-25T18:34:13Z

src/transformers/models/longt5/modeling_longt5.py

+        self.gradient_checkpointing = False
+
        # Initialize weights and apply final processing
        self.post_init()


here moving it before post init because post_init() calls gradient_checkpointing_enable() if in the config you have an attribute gradient_checkpointing

younesbelkada · 2023-10-25T18:34:38Z

src/transformers/models/longt5/modeling_longt5.py


            if self.gradient_checkpointing and self.training:
-                layer_outputs = checkpoint(
+                layer_outputs = self.gradient_checkpointing_func(


here some models were using torch.checkpoint instead of self.gradient_checkpointing_func so I fixed it here

younesbelkada · 2023-10-25T18:34:49Z

src/transformers/models/musicgen/modeling_musicgen.py


            if self.gradient_checkpointing and self.training:
-                layer_outputs = checkpoint(
+                layer_outputs = self.gradient_checkpointing_func(


HuggingFaceDocBuilderDev · 2023-10-25T18:55:59Z

The documentation is not available anymore as the PR was closed or merged.

… previous behaviour

younesbelkada · 2023-10-26T08:18:16Z

@ArthurZucker @LysandreJik - as discussed offline now this PR reverts back the previous behaviour (i.e. if a user sets module.gradient_checkpointing = True in a module that supports it, everthing should work fine) + I have set gradient_checkpointing_func as a private attribute. This PR is ready for review

ArthurZucker

Thanks a lot. Maybe if "checkpointing_function" is an attribute it would be more accessible and allows us to document it WDYT?

src/transformers/modeling_utils.py

ArthurZucker · 2023-10-27T13:27:19Z

src/transformers/modeling_utils.py

+        # Apply it on the top-level module in case the top-level modules supports it
+        # for example, LongT5Stack inherits from `PreTrainedModel`.
+        if hasattr(self, "gradient_checkpointing"):
+            self._gradient_checkpointing_func = gradient_checkpointing_func


Suggested change

self._gradient_checkpointing_func = gradient_checkpointing_func

self._checkpoint = gradient_checkpointing_func

no what would appear best for users to know that its basically just torch.utils.checkpoint

Hmmm what I like with _gradient_checkpointing_func is that it tells users that it is a function + 'checkpoint' seems a bit ambiguous to me (it can sound like a model checkpoint?)

src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

## Describe your changes The latest version of transformers (>= 4.35.0) is not compatible with the model. PRs: huggingface/transformers#27020, huggingface/transformers#27073 change the expected signature of `_set_gradient_checkpointing` which now doesn't match the model's https://huggingface.co/microsoft/phi-1_5/blob/main/modeling_mixformer_sequential.py#L802 ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link

…27073) * fix * more fixes * fix other models * fix long t5 * use `gradient_checkpointing_func` instead * fix copies * set `gradient_checkpointing_func` as a private attribute and retrieve previous behaviour * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * replace it with `is_gradient_checkpointing_set` * remove default * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

younesbelkada added 6 commits October 25, 2023 17:51

fix

a9f67f5

more fixes

d7e57a7

fix other models

40104c2

fix long t5

2247f63

use gradient_checkpointing_func instead

529cab7

fix copies

c43df42

younesbelkada commented Oct 25, 2023

View reviewed changes

set gradient_checkpointing_func as a private attribute and retrieve…

3149a01

… previous behaviour

younesbelkada requested review from ArthurZucker and LysandreJik October 26, 2023 08:18

ArthurZucker approved these changes Oct 27, 2023

View reviewed changes

younesbelkada and others added 5 commits October 27, 2023 15:36

Update src/transformers/modeling_utils.py

52d7fa0

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

replace it with is_gradient_checkpointing_set

4afd64e

remove default

a6ca849

Update src/transformers/modeling_utils.py

ef2fd18

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fixup

e31faf6

younesbelkada merged commit ffff9e7 into huggingface:main Oct 27, 2023

younesbelkada deleted the finalize-gc branch October 27, 2023 14:15

jambayk mentioned this pull request Nov 2, 2023

Pin transformers version in phi example microsoft/Olive#691

Merged

5 tasks

LZHgrla mentioned this pull request Nov 3, 2023

[Fix] Temporarily limit the version of transformers InternLM/xtuner#200

Merged

younesbelkada mentioned this pull request Nov 20, 2023

[core / gradient_checkpointing] add support for old GC method #27610

Merged

x54-729 mentioned this pull request Jan 22, 2024

Failed to use gradient_checkpointing with Transformers[Bug] InternLM/InternLM#644

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core`/ `gradient_checkpointing`] Refactor GC - part 2#27073

[`core`/ `gradient_checkpointing`] Refactor GC - part 2#27073
younesbelkada merged 12 commits intohuggingface:mainfrom
younesbelkada:finalize-gc

younesbelkada commented Oct 25, 2023 •

edited

Loading

Uh oh!

younesbelkada Oct 25, 2023

Uh oh!

younesbelkada Oct 25, 2023

Uh oh!

younesbelkada Oct 25, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Oct 25, 2023 •

edited

Loading

Uh oh!

younesbelkada commented Oct 26, 2023

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

ArthurZucker Oct 27, 2023

Uh oh!

younesbelkada Oct 27, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self._gradient_checkpointing_func = gradient_checkpointing_func
	self._checkpoint = gradient_checkpointing_func

Conversation

younesbelkada commented Oct 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

younesbelkada Oct 25, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Oct 25, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Oct 25, 2023

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younesbelkada commented Oct 26, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

younesbelkada commented Oct 25, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 25, 2023 •

edited

Loading