Skip to content

torch.utils.checkpoint.checkpoint + torch.cuda.amp #40221

Closed
tano297 wants to merge 1 commit intopytorch:masterfrom
tano297:autocast_grad_checkpoint
Closed

torch.utils.checkpoint.checkpoint + torch.cuda.amp #40221
tano297 wants to merge 1 commit intopytorch:masterfrom
tano297:autocast_grad_checkpoint

Conversation

@tano297
Copy link
Copy Markdown

@tano297 tano297 commented Jun 18, 2020

Simple 2 line workaround to allow gradient checkpointing to work with amp autocast.
In the same way pytorch stores the "has_cuda" state in the context, we store a "has_autocast" during the first forward pass, so that we can re-enable it when the forward pass runs for the second time during the backward pass.

For anybody having this problem, the simple solution to this problem before this is merged can be found either here: #37730, or by simply copying this version of the file into your own codebase with the added 2 lines.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Jun 18, 2020

💊 CI failures summary and remediations

As of commit a9cca95 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

@zou3519
Copy link
Copy Markdown
Contributor

zou3519 commented Jun 23, 2020

@mcarilli I requested review from you because of the mention of amp, please let me know if that's not right

@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 23, 2020
@tano297
Copy link
Copy Markdown
Author

tano297 commented Jul 29, 2020

Any update on this? Let me know how I can help

@ZhichengHuang
Copy link
Copy Markdown

Is this bug solved? I have met the same issue.

Copy link
Copy Markdown
Collaborator

@mcarilli mcarilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

I think the usual custom autograd function decorators aren't preferable here, because CheckpointFunction.backward runs a nested forward and backward. The autocast API recommends running only forward under autocast, but globally enabling autocast for all of CheckpointFunction.backward (as @custom_bwd might do) would include the nested backward as well.

Definitely needs a test though.

@mcarilli
Copy link
Copy Markdown
Collaborator

PR appears orphaned, moving to #49757.

@mcarilli mcarilli closed this Dec 22, 2020
facebook-github-bot pushed a commit that referenced this pull request Dec 23, 2020
Summary:
Adds a test to orphaned original PR (#40221).

Should fix #49738 and #47183

Pull Request resolved: #49757

Reviewed By: mruberry

Differential Revision: D25689609

Pulled By: ngimel

fbshipit-source-id: 0a6adc11eb98382048ef9a9775e185dcdeff6010
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Adds a test to orphaned original PR (pytorch#40221).

Should fix pytorch#49738 and pytorch#47183

Pull Request resolved: pytorch#49757

Reviewed By: mruberry

Differential Revision: D25689609

Pulled By: ngimel

fbshipit-source-id: 0a6adc11eb98382048ef9a9775e185dcdeff6010
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants