[compiled autograd] Proxy opaque nodes for built-in autograd nodes by zou3519 · Pull Request #143296 · pytorch/pytorch

zou3519 · 2024-12-16T16:16:17Z

Stack from ghstack (oldest at bottom):

This PR is on the way to getting compiled autograd's initial capture to
stop specializing on Tensor metadata.

This PR changes compiled autograd's initial capture to proxy an opaque
(w.r.t. Dynamo) function into the graph for all built-in codegen'ed
autograd nodes and validate_outputs.

We changed each codegen'ed apply_with_saved (e.g.
MulBackward0::apply_with_saved) to call into Python to proxy a function
(compiled_autograd.ops.MulBackward0) into the graph. Then, we use the
node's InputMetadata to "guess" at the properties of the output Tensors
to create some new FakeTensors.

Some details:

MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be
call to Python via libtorch_python. There is an indirection
(PyCompilerInterface) to do this.
MulBackward0::apply_with_saved passes a C++ function to Python. To make
our lives easier, every codegen'ed apply_with_saved passes a C++
function with the same signature
(variable_list, ivalue_list) -> variable_list.
We define how to pack arbitrary C++ types into IValue via a helper
IValuePacker struct and codegen functional variants of each builtin
C++ autograd node (e.g. MulBackward0_apply_functional_ivalue).

MulBackward0 before this PR:
https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de

MulBackward0 after this PR:
https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0

Test Plan:

existing tests

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @xmfan

This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests [ghstack-poisoned]

pytorch-bot · 2024-12-16T16:16:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143296

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5067edc with merge base 54e2f4b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/inductor/test_compiled_autograd.py

torch/_dynamo/compiled_autograd.py

…ad nodes" This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan [ghstack-poisoned]

…ad nodes" This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan [ghstack-poisoned]

…ad nodes" This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests [ghstack-poisoned]

This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests ghstack-source-id: b6b3779 Pull Request resolved: #143296

…43304) CopyBackwards is a manual C++ torch::autograd::Node; we update its apply_with_saved to proxy a functional version of it into the graph instead of inlining into it. Test Plan: - existing tests Pull Request resolved: #143304 Approved by: https://github.com/xmfan, https://github.com/jansel ghstack dependencies: #143296

…backwards (#143405) We will always proxy autograd.Function nodes in compiled autograd's initial graph capture (previously there was an option to proxy vs trace into the autograd.Function) We have some requirements for the AOTBackward. Compiled Autograd runs accumulate grad reordering passes on the AOTBackward graph directly after the initial graph capture, so we can't just proxy a single node for it. Instead, we: - proxy the AOTBackward prologue function into the CA graph - copy-paste the AOTBackward graph into the CA graph - trace directly through the epilogue (the traced nodes go into the CA graph). Tracing through the epilogue is safe (assuming no Tensor subclasses) because the only thing the epilogue does is drop some outputs. The Tensor subclass situation was already broken so this doesn't regress anything but this PR sets it up to be fixed (in a followup, where we will proxy "make_subclass" calls into the graph from the epilogue). Test Plan: - existing tests Pull Request resolved: #143405 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387

#143417) The previous PRs built up to this. We change compiled autograd's initial trace to stop baking in metadata. While tracing, we allocate some weirdly shaped tensors that we can put proxies on. The initial trace should not be accessing any metadata of these tensors (it will likely error out if it does because of how weird the shapes are). This involved fixing some various sites where we do specialize on the metadata, like: - we change CopySlices's apply_with_saved to proxy some calls into the graph (this change is fairly hard to split out by itself). - we stop calling InputBuffer::add - we delete the weird metadata from the graph so that no graph passes can make use of it. Test Plan: - tests Pull Request resolved: #143417 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387, #143405

Compiled autograd's initial trace traces through the AOTBackward epilogue. The Tensor Subclass code is not traceable. This PR changes it so that when we see Tensor Subclass constructors, we proxy nodes for their construction into the graph. Test Plan: - New basic test with TwoTensor - Existing tests Pull Request resolved: #144115 Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh ghstack dependencies: #143296, #143304, #143387, #143405, #143417

…144115)" This reverts commit 082c28c. Reverted #144115 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](#143296 (comment))) [ghstack-poisoned]

…ad nodes" This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan [ghstack-poisoned]

…43304) CopyBackwards is a manual C++ torch::autograd::Node; we update its apply_with_saved to proxy a functional version of it into the graph instead of inlining into it. Test Plan: - existing tests Pull Request resolved: #143304 Approved by: https://github.com/xmfan, https://github.com/jansel ghstack dependencies: #143296 ghstack-source-id: 184facc

…backwards (#143405) We will always proxy autograd.Function nodes in compiled autograd's initial graph capture (previously there was an option to proxy vs trace into the autograd.Function) We have some requirements for the AOTBackward. Compiled Autograd runs accumulate grad reordering passes on the AOTBackward graph directly after the initial graph capture, so we can't just proxy a single node for it. Instead, we: - proxy the AOTBackward prologue function into the CA graph - copy-paste the AOTBackward graph into the CA graph - trace directly through the epilogue (the traced nodes go into the CA graph). Tracing through the epilogue is safe (assuming no Tensor subclasses) because the only thing the epilogue does is drop some outputs. The Tensor subclass situation was already broken so this doesn't regress anything but this PR sets it up to be fixed (in a followup, where we will proxy "make_subclass" calls into the graph from the epilogue). Test Plan: - existing tests Pull Request resolved: #143405 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387 ghstack-source-id: 82c8362

#143417) The previous PRs built up to this. We change compiled autograd's initial trace to stop baking in metadata. While tracing, we allocate some weirdly shaped tensors that we can put proxies on. The initial trace should not be accessing any metadata of these tensors (it will likely error out if it does because of how weird the shapes are). This involved fixing some various sites where we do specialize on the metadata, like: - we change CopySlices's apply_with_saved to proxy some calls into the graph (this change is fairly hard to split out by itself). - we stop calling InputBuffer::add - we delete the weird metadata from the graph so that no graph passes can make use of it. Test Plan: - tests Pull Request resolved: #143417 Approved by: https://github.com/jansel, https://github.com/xmfan ghstack dependencies: #143296, #143304, #143387, #143405 ghstack-source-id: f0bdd63

Compiled autograd's initial trace traces through the AOTBackward epilogue. The Tensor Subclass code is not traceable. This PR changes it so that when we see Tensor Subclass constructors, we proxy nodes for their construction into the graph. Test Plan: - New basic test with TwoTensor - Existing tests Pull Request resolved: #144115 Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh ghstack dependencies: #143296, #143304, #143387, #143405, #143417 ghstack-source-id: b79847c

Compiled autograd's initial trace traces through the AOTBackward epilogue. The Tensor Subclass code is not traceable. This PR changes it so that when we see Tensor Subclass constructors, we proxy nodes for their construction into the graph. Test Plan: - New basic test with TwoTensor - Existing tests Pull Request resolved: #144115 Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh ghstack dependencies: #143296, #143304, #143387, #143405, #143417 ghstack-source-id: 8876f26

This was referenced Dec 16, 2024

[gen_autograd_functions] rename some variables #143166

Closed

Add missing IValue overloads for SymInt lists #143167

Closed

pytorch-bot bot added ciflow/inductor module: compiled autograd compiled_autograd module: dynamo module: inductor labels Dec 16, 2024

zou3519 commented Dec 16, 2024

View reviewed changes

test/inductor/test_compiled_autograd.py Show resolved Hide resolved

zou3519 commented Dec 16, 2024

View reviewed changes

torch/_dynamo/compiled_autograd.py Outdated Show resolved Hide resolved

zou3519 added keep-going Don't stop on first failure, keep running tests until the end topic: not user facing topic category labels Dec 16, 2024

zou3519 mentioned this pull request Dec 16, 2024

[compiled autograd] Proxy a node for CopyBackwards into the graph #143304

Closed

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Dec 16, 2024

zou3519 added 3 commits December 19, 2024 11:13

This was referenced Jan 3, 2025

[ca] add test_dtensor_compile.py to compiled autograd tests #144114

Closed

[compiled autograd] support Tensor Subclasses in AOTBackward #144115

Closed

zou3519 added the ci-no-td Do not run TD on this PR label Jan 3, 2025

pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 00:07 Inactive

zou3519 mentioned this pull request Jan 24, 2025

Revert "[compiled autograd] support Tensor Subclasses in AOTBackward (#144115)" #145586

Closed

zou3519 mentioned this pull request Jan 24, 2025

functional compiled autograd #144707

Closed

pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:17 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiled autograd] Proxy opaque nodes for built-in autograd nodes#143296

[compiled autograd] Proxy opaque nodes for built-in autograd nodes#143296
zou3519 wants to merge 19 commits intogh/zou3519/1106/basefrom
gh/zou3519/1106/head

zou3519 commented Dec 16, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Dec 16, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zou3519 commented Dec 16, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143296

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zou3519 commented Dec 16, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 16, 2024 •

edited

Loading