Retry - [JIT] Propagate profiled information to DifferentiableGraph outputs#79498
Retry - [JIT] Propagate profiled information to DifferentiableGraph outputs#79498davidberard98 wants to merge 5 commits intogh/davidberard98/115/basefrom
Conversation
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. [ghstack-poisoned]
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 3b1521a (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. ghstack-source-id: 53436e2 Pull Request resolved: #79498
…ableGraph outputs" Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. [ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
1 similar comment
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…ableGraph outputs" Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. Differential Revision: [D37131946](https://our.internmc.facebook.com/intern/diff/D37131946) [ghstack-poisoned]
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. ghstack-source-id: 6297ca4 Pull Request resolved: #79498
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| } | ||
|
|
||
| if (!requiresGradResults.empty()) { | ||
| bool requires_grad = std::any_of( |
There was a problem hiding this comment.
@eellison does this seem like a safe fix to the nograd issue?
I can think of one case where the output should requires_grad but this won't do so: if all the uses of an diff. graph output that requires_grad are used in nograd sections, then all the profiled values will record requires_grad=False. But I think that's probably fine because then I don't think it matters that it requires_grad?
…ableGraph outputs" Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. Differential Revision: [D37131946](https://our.internmc.facebook.com/intern/diff/D37131946) [ghstack-poisoned]
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. ghstack-source-id: f2c6290 Pull Request resolved: #79498
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…ableGraph outputs" Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. Differential Revision: [D37131946](https://our.internmc.facebook.com/intern/diff/D37131946) [ghstack-poisoned]
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. ghstack-source-id: 2d2bdf2 Pull Request resolved: #79498
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here |
|
@davidberard98 your PR has been successfully merged. |
|
Hey @davidberard98. |
…utputs (#79498) Summary: Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. Pull Request resolved: #79498 Approved by: https://github.com/eellison Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/268ced51045f59c4a36c32edcad736a754c61288 Reviewed By: eellison Differential Revision: D37131946 Pulled By: davidberard98 fbshipit-source-id: 7dcfa99f65048d59eaf0628f89ecabe123f16a77
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. Pull Request resolved: pytorch#79498 Approved by: https://github.com/eellison
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. Pull Request resolved: pytorch#79498 Approved by: https://github.com/eellison
Stack from ghstack:
Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output.
Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs.
Changes from previous revision: previously, an output node would sometimes be profiled within a no_grad region, meaning that the profiled value would have the wrong requires_grad value (i.e. it would be profiled as
requires_grad=Falsein the no_grad region; but at the time of output, it would have been profiled asrequires_grad=True). To fix this, we keep track of which context each node is part of, and only consider profile nodes that are within the same context region. Tested on both of the internal models. (the memory leak and no_grad model).Differential Revision: D37131946