Skip to content

[compiled autograd] support Tensor Subclasses in AOTBackward#144115

Closed
zou3519 wants to merge 15 commits intogh/zou3519/1112/basefrom
gh/zou3519/1112/head
Closed

[compiled autograd] support Tensor Subclasses in AOTBackward#144115
zou3519 wants to merge 15 commits intogh/zou3519/1112/basefrom
gh/zou3519/1112/head

Conversation

@zou3519
Copy link
Copy Markdown
Contributor

@zou3519 zou3519 commented Jan 3, 2025

Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

We also change a DTensor constructor call to be DTensor.from_local. This
is because Dynamo cannot handle raw DTensor constructor calls but is
able to handle DTensor.from_local.

Test Plan:
- More compiled autograd x Tensor subclass tests pass now. The other
  failures need investigation but a lot of them are because Dynamo
  isn't able to trace DTensor internals that are now exposed (due to
  autograd.Function).
- Existing tests

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jan 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144115

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b4dd186 with merge base 54e2f4b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor module: compiled autograd compiled_autograd module: dynamo module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jan 3, 2025
zou3519 added a commit that referenced this pull request Jan 3, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

We also change a DTensor constructor call to be DTensor.from_local. This
is because Dynamo cannot handle raw DTensor constructor calls but is
able to handle DTensor.from_local.

Test Plan:
- More compiled autograd x Tensor subclass tests pass now. The other
  failures need investigation but a lot of them are because Dynamo
  isn't able to trace DTensor internals that are now exposed (due to
  autograd.Function).
- Existing tests

ghstack-source-id: 3ab2945
Pull Request resolved: #144115
@zou3519 zou3519 added the release notes: composability release notes category label Jan 3, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

We also change a DTensor constructor call to be DTensor.from_local. This
is because Dynamo cannot handle raw DTensor constructor calls but is
able to handle DTensor.from_local.

Test Plan:
- More compiled autograd x Tensor subclass tests pass now. The other
  failures need investigation but a lot of them are because Dynamo
  isn't able to trace DTensor internals that are now exposed (due to
  autograd.Function).
- Existing tests

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

NB: I tried to run this on the DTensor x compiled autograd tests, but it
didn't improve anything. The main problem there was that the DTensor
internal code now needs to be Dynamo-able and it is not (there are
things like DTensor constructor calls and attribute accesses that are
not yet Dynamo-able).

Test Plan:
- New basic test with TwoTensor
- Existing tests

[ghstack-poisoned]
@zou3519 zou3519 added the ci-no-td Do not run TD on this PR label Jan 3, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

[ghstack-poisoned]
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 3, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

ghstack-source-id: 7660405
Pull Request resolved: #144115
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 3, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

ghstack-source-id: 7a14d87
Pull Request resolved: #144115
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 9, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

ghstack-source-id: 7d128cf
Pull Request resolved: #144115
@zou3519 zou3519 requested review from jansel and xmfan January 9, 2025 21:55
zou3519 added a commit that referenced this pull request Jan 23, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

Pull Request resolved: #144115
Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh
ghstack dependencies: #143296, #143304, #143387, #143405, #143417
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 24, 2025 00:04 Error
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 24, 2025 00:04 Error
@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 24, 2025 00:04 Error
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 00:04 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 00:04 Inactive
zou3519 added a commit that referenced this pull request Jan 24, 2025
…144115)"

This reverts commit 082c28c.

Reverted #144115 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](#143296 (comment)))

[ghstack-poisoned]
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 24, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

Pull Request resolved: #144115
Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh
ghstack dependencies: #143296, #143304, #143387, #143405, #143417
ghstack-source-id: b79847c
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 24, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

Pull Request resolved: #144115
Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh
ghstack dependencies: #143296, #143304, #143387, #143405, #143417
ghstack-source-id: 8876f26
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:17 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:17 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:19 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:19 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:19 Inactive
facebook-github-bot pushed a commit that referenced this pull request Jan 24, 2025
Summary:
This PR squashes together the following commits:

#144115
#143417
#143405
#143387
#143304
#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

bypass-github-export-checks failing CI test is bypassable on OSS but cannot be retried.


Reviewed By: jansel, xmfan, bdhirsh

Differential Revision: D68120850

Pulled By: zou3519
zou3519 added a commit that referenced this pull request Jan 26, 2025
Summary:
This PR squashes together the following commits:

#144115
#143417
#143405
#143387
#143304
#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

bypass-github-export-checks failing CI test is bypassable on OSS but cannot be retried.

Reviewed By: jansel, xmfan, bdhirsh

Differential Revision: D68120850

Pulled By: zou3519
pytorchmergebot pushed a commit that referenced this pull request Jan 27, 2025
This PR squashes together the following commits:

#144115
#143417
#143405
#143387
#143304
#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

Pull Request resolved: #144707
Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel
nWEIdia pushed a commit to nWEIdia/pytorch that referenced this pull request Jan 27, 2025
This PR squashes together the following commits:

pytorch#144115
pytorch#143417
pytorch#143405
pytorch#143387
pytorch#143304
pytorch#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

Pull Request resolved: pytorch#144707
Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel
@zou3519 zou3519 closed this Feb 12, 2025
@github-actions github-actions bot deleted the gh/zou3519/1112/head branch March 23, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: compiled autograd compiled_autograd module: dynamo module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: composability release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants