Skip to content

[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function#143387

Closed
zou3519 wants to merge 20 commits intogh/zou3519/1108/basefrom
gh/zou3519/1108/head
Closed

[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function#143387
zou3519 wants to merge 20 commits intogh/zou3519/1108/basefrom
gh/zou3519/1108/head

Conversation

@zou3519
Copy link
Copy Markdown
Contributor

@zou3519 zou3519 commented Dec 17, 2024

Stack from ghstack (oldest at bottom):

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:

  • we define how to pack/unpack ctx.saved_data into an IValue. It's a
    Dict[str, IValue], so it wasn't difficult.
  • every call to CppNode::apply_with_saved binds a new function to
    Python. This is because we're unable to reuse the a previously bound
    function for reasons (the schema may change depending on what the user
    actually puts into their Dict[str, IValue]).

Test Plan:

  • existing tests

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @xmfan

…:Function

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Dec 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143387

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b25bacf with merge base 54e2f4b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Dec 17, 2024
@zou3519 zou3519 added the keep-going Don't stop on first failure, keep running tests until the end label Dec 17, 2024
@zou3519 zou3519 marked this pull request as draft December 17, 2024 15:59
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

cc EikanWang jgong5 wenzhe-nrv sanchitintel voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

cc EikanWang jgong5 wenzhe-nrv sanchitintel voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
@zou3519 zou3519 added the topic: not user facing topic category label Dec 19, 2024
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

cc EikanWang jgong5 wenzhe-nrv sanchitintel voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

cc EikanWang jgong5 wenzhe-nrv sanchitintel voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Dec 20, 2024
…:Function

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

ghstack-source-id: 47c2262
Pull Request resolved: #143387
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

cc EikanWang jgong5 wenzhe-nrv sanchitintel voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

[ghstack-poisoned]
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

[ghstack-poisoned]
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 23, 2025
…:Function

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

ghstack-source-id: 7051f00
Pull Request resolved: #143387
zou3519 added a commit that referenced this pull request Jan 23, 2025
…backwards (#143405)

We will always proxy autograd.Function nodes in compiled autograd's
initial graph capture (previously there was an
option to proxy vs trace into the autograd.Function)

We have some requirements for the AOTBackward. Compiled Autograd runs
accumulate grad reordering passes on the AOTBackward graph directly
after the initial graph capture, so we can't just proxy a single node for it.

Instead, we:
- proxy the AOTBackward prologue function into the CA graph
- copy-paste the AOTBackward graph into the CA graph
- trace directly through the epilogue (the traced nodes go into the CA
  graph).

Tracing through the epilogue is safe (assuming no Tensor subclasses)
because the only thing the epilogue does is drop some outputs. The
Tensor subclass situation was already broken so this doesn't regress
anything but this PR sets it up to be fixed (in a followup, where we
will proxy "make_subclass" calls into the graph from the epilogue).

Test Plan:
- existing tests
Pull Request resolved: #143405
Approved by: https://github.com/jansel, https://github.com/xmfan
ghstack dependencies: #143296, #143304, #143387
zou3519 added a commit that referenced this pull request Jan 23, 2025
#143417)

The previous PRs built up to this. We change compiled autograd's initial
trace to stop baking in metadata.

While tracing, we allocate some weirdly shaped tensors that we can put
proxies on. The initial trace should not be accessing any metadata of
these tensors (it will likely error out if it does because of how weird
the shapes are).

This involved fixing some various sites where we do specialize on the
metadata, like:
- we change CopySlices's apply_with_saved to proxy some calls
  into the graph (this change is fairly hard to split out by itself).
- we stop calling InputBuffer::add
- we delete the weird metadata from the graph so that no graph passes
  can make use of it.

Test Plan:
- tests
Pull Request resolved: #143417
Approved by: https://github.com/jansel, https://github.com/xmfan
ghstack dependencies: #143296, #143304, #143387, #143405
zou3519 added a commit that referenced this pull request Jan 23, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

Pull Request resolved: #144115
Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh
ghstack dependencies: #143296, #143304, #143387, #143405, #143417
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 00:06 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 00:06 Inactive
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

cc EikanWang jgong5 wenzhe-nrv sanchitintel voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 24, 2025
…:Function

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

ghstack-source-id: 1f47dd4
Pull Request resolved: #143387
zou3519 added a commit that referenced this pull request Jan 24, 2025
…backwards (#143405)

We will always proxy autograd.Function nodes in compiled autograd's
initial graph capture (previously there was an
option to proxy vs trace into the autograd.Function)

We have some requirements for the AOTBackward. Compiled Autograd runs
accumulate grad reordering passes on the AOTBackward graph directly
after the initial graph capture, so we can't just proxy a single node for it.

Instead, we:
- proxy the AOTBackward prologue function into the CA graph
- copy-paste the AOTBackward graph into the CA graph
- trace directly through the epilogue (the traced nodes go into the CA
  graph).

Tracing through the epilogue is safe (assuming no Tensor subclasses)
because the only thing the epilogue does is drop some outputs. The
Tensor subclass situation was already broken so this doesn't regress
anything but this PR sets it up to be fixed (in a followup, where we
will proxy "make_subclass" calls into the graph from the epilogue).

Test Plan:
- existing tests
Pull Request resolved: #143405
Approved by: https://github.com/jansel, https://github.com/xmfan
ghstack dependencies: #143296, #143304, #143387
ghstack-source-id: 82c8362
zou3519 added a commit that referenced this pull request Jan 24, 2025
#143417)

The previous PRs built up to this. We change compiled autograd's initial
trace to stop baking in metadata.

While tracing, we allocate some weirdly shaped tensors that we can put
proxies on. The initial trace should not be accessing any metadata of
these tensors (it will likely error out if it does because of how weird
the shapes are).

This involved fixing some various sites where we do specialize on the
metadata, like:
- we change CopySlices's apply_with_saved to proxy some calls
  into the graph (this change is fairly hard to split out by itself).
- we stop calling InputBuffer::add
- we delete the weird metadata from the graph so that no graph passes
  can make use of it.

Test Plan:
- tests
Pull Request resolved: #143417
Approved by: https://github.com/jansel, https://github.com/xmfan
ghstack dependencies: #143296, #143304, #143387, #143405
ghstack-source-id: f0bdd63
zou3519 added a commit that referenced this pull request Jan 24, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

Pull Request resolved: #144115
Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh
ghstack dependencies: #143296, #143304, #143387, #143405, #143417
ghstack-source-id: b79847c
…::autograd::Function"

We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests

cc EikanWang jgong5 wenzhe-nrv sanchitintel voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jan 24, 2025
Compiled autograd's initial trace traces through the AOTBackward
epilogue. The Tensor Subclass code is not traceable. This PR changes it
so that when we see Tensor Subclass constructors, we proxy nodes for
their construction into the graph.

Test Plan:
- New basic test with TwoTensor
- Existing tests

Pull Request resolved: #144115
Approved by: https://github.com/jansel, https://github.com/xmfan, https://github.com/bdhirsh
ghstack dependencies: #143296, #143304, #143387, #143405, #143417
ghstack-source-id: 8876f26
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:17 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 05:17 Inactive
facebook-github-bot pushed a commit that referenced this pull request Jan 24, 2025
Summary:
This PR squashes together the following commits:

#144115
#143417
#143405
#143387
#143304
#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

bypass-github-export-checks failing CI test is bypassable on OSS but cannot be retried.


Reviewed By: jansel, xmfan, bdhirsh

Differential Revision: D68120850

Pulled By: zou3519
zou3519 added a commit that referenced this pull request Jan 26, 2025
Summary:
This PR squashes together the following commits:

#144115
#143417
#143405
#143387
#143304
#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov xmfan

bypass-github-export-checks failing CI test is bypassable on OSS but cannot be retried.

Reviewed By: jansel, xmfan, bdhirsh

Differential Revision: D68120850

Pulled By: zou3519
pytorchmergebot pushed a commit that referenced this pull request Jan 27, 2025
This PR squashes together the following commits:

#144115
#143417
#143405
#143387
#143304
#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

Pull Request resolved: #144707
Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel
nWEIdia pushed a commit to nWEIdia/pytorch that referenced this pull request Jan 27, 2025
This PR squashes together the following commits:

pytorch#144115
pytorch#143417
pytorch#143405
pytorch#143387
pytorch#143304
pytorch#143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

Pull Request resolved: pytorch#144707
Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel
@zou3519 zou3519 closed this Feb 12, 2025
@github-actions github-actions bot deleted the gh/zou3519/1108/head branch March 23, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor keep-going Don't stop on first failure, keep running tests until the end Merged module: compiled autograd compiled_autograd module: dynamo module: inductor oncall: jit Add this issue/PR to JIT oncall triage queue Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants