Introduce GradientCheckpointingLayer by qubvel · Pull Request #37223 · huggingface/transformers

qubvel · 2025-04-02T22:14:16Z

What does this PR do?

A super minimal abstraction for a layer with gradient checkpointing that keeps the logic for enabling and disabling gradient checkpointing within PreTrainedModel for backward compatibility. It allows for a gradual rollout of the feature by supporting both checkpointing mechanisms: with a the current wrap of _gradient_checkpointing_func and using inheritance from GradientCheckpointingLayer.

I've applied this to Llama, but it's just a PoC for the discussion. Perhaps it's better to start with another less popular model that has fewer dependent models to see how it goes and check if it can be breaking for the hub custom code

Who can review?

qubvel · 2025-04-02T22:17:10Z

run-slow: llama

github-actions · 2025-04-02T22:18:33Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/llama']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-04-02T22:42:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Nice! Lets put it in another file to separate a bit! Otherwise marvelleous

qubvel · 2025-04-08T15:42:38Z

Ended up applying to all llama-based models and SigLIP/2 for the first iteration, all relevant tests pass

RUN_SLOW=1 pytest -k "gradient" tests/models/

CI error is unrelated

cc @ArthurZucker to merge if OK for you

ArthurZucker

🧼 thanks a lot!

ArthurZucker · 2025-04-11T12:21:04Z

+
+
+class GradientCheckpointingLayer(nn.Module):
+    """Base class for layers with gradient checkpointing.


thanks for documenting as well!

sfc-gh-sbekman · 2025-05-06T00:57:33Z

This is very neat, Pavel!

Might be useful for us users to add to the doc:

that "use_reentrant": True is by default and perhaps pointing to where it gets set if not overridden by the user and
point the reader to gradient_checkpointing_enable which defines the doc on how to override use_reentrant value.

Perhaps even add redundancy/shortcut by adding to the doc:

            gradient_checkpointing_kwargs = {"use_reentrant": True}
            model.gradient_checkpointing_enable(gradient_checkpointing_kwargs=gradient_checkpointing_kwargs)

it's not very intuitive, I first tried to call:

model.gradient_checkpointing_enable(use_reentrant=True)

and it failed TypeError: PreTrainedModel.gradient_checkpointing_enable() got an unexpected keyword argument 'use_reentrant'

* GradientCheckpointingLayer * trigger * Move GC layer to a separate file * Update import * Expose and document GC layer * Fix dummy * Apply to llama-based models * Update modulars * Update a few more models for consistency * Update glm4 * Update Janus

GradientCheckpointingLayer

486f155

qubvel marked this pull request as ready for review April 2, 2025 22:15

github-actions Bot requested review from ArthurZucker and Cyrilvallez April 2, 2025 22:16

trigger

e5d326a

ArthurZucker approved these changes Apr 4, 2025

View reviewed changes

qubvel added 9 commits April 4, 2025 16:01

Move GC layer to a separate file

70dc32e

Update import

9f8c1ce

Expose and document GC layer

fc96fad

Merge branch 'main' into gradient-checkpointing-layer

657c538

Merge branch 'main' into gradient-checkpointing-layer

5a7dd6b

Fix dummy

72baa13

Apply to llama-based models

31f6720

Update modulars

334043a

Update a few more models for consistency

d43dfd4

qubvel requested a review from ArthurZucker April 8, 2025 15:42

qubvel added 2 commits April 11, 2025 09:47

Merge branch 'main' into gradient-checkpointing-layer

da0de60

Update glm4

23e6e24

ArthurZucker approved these changes Apr 11, 2025

View reviewed changes

qubvel added 3 commits April 18, 2025 17:18

Merge branch 'main' into gradient-checkpointing-layer

2093264

Merge branch 'main' into gradient-checkpointing-layer

2e913fd

Update Janus

4640e46

qubvel merged commit 9167fad into huggingface:main Apr 22, 2025

efsotr mentioned this pull request Jun 11, 2025

chore: support passing kwargs when using gradient checkpointing #38744

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce GradientCheckpointingLayer#37223

Introduce GradientCheckpointingLayer#37223
qubvel merged 16 commits into
huggingface:mainfrom
qubvel:gradient-checkpointing-layer

qubvel commented Apr 2, 2025 •

edited

Loading

Uh oh!

qubvel commented Apr 2, 2025

Uh oh!

github-actions Bot commented Apr 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

qubvel commented Apr 8, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Apr 11, 2025

Uh oh!

sfc-gh-sbekman commented May 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		class GradientCheckpointingLayer(nn.Module):
		"""Base class for layers with gradient checkpointing.

Conversation

qubvel commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

qubvel commented Apr 2, 2025

Uh oh!

github-actions Bot commented Apr 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

qubvel commented Apr 8, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-sbekman commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qubvel commented Apr 2, 2025 •

edited

Loading

sfc-gh-sbekman commented May 6, 2025 •

edited

Loading