[aotd] capture rrelu_with_noise noise mutation#138503
[aotd] capture rrelu_with_noise noise mutation#138503IvanKobzarev wants to merge 13 commits intogh/IvanKobzarev/78/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138503
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 10c764e with merge base 5f287df ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
[ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` [ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` [ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
| auto [noise_value, noise_bdim] = unwrapTensorAtLevel(noise, cur_level); | ||
| TORCH_CHECK(!noise_bdim.has_value(), "vmap: Attempted to vmap over 'noise' in torch.rrelu_with_noise. This is not supported."); | ||
| auto res = rrelu_with_noise_batch_rule(self_value, self_bdim, noise_value, noise_bdim, lower, upper, training, std::move(generator)); | ||
| return makeBatched(std::get<0>(res), std::get<1>(res), cur_level); |
There was a problem hiding this comment.
cc @zou3519 does this updated vmap rule look ok?
There was a problem hiding this comment.
Why did we need to change the vmap rule?
bdhirsh
left a comment
There was a problem hiding this comment.
lgtm, although:
(1) I'll wait for Richard to comment on the batching rule
(2) there's an XLA failure, so you're probably going to need an XLA-side patch. @qihqi @miladm who from the XLA side can help coordinate?
From the error, it looks like you need to remove const from the noise argument in the XLA lowering here: https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp#L3022
|
xla PR pytorch/xla#8309 |
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
| return std::make_tuple(ret, 0, noise_, 0); | ||
| } | ||
|
|
||
| static Tensor rrelu_with_noise_batch( |
There was a problem hiding this comment.
Is this operation deterministic?
There was a problem hiding this comment.
No, the noise is random uniform on every call.
zou3519
left a comment
There was a problem hiding this comment.
batching rule looks fine to me
This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29. This commit also fixes the CI failure after this commit 54d9e24 got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR. While exporting the fx_graph through fx_importer for `rrelu` and `rrelu_with_noise` op for train mode, it decomposes the `aten.rrelu_with_noise` op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325, resulting in the runtime failure. This issue would probably be fixed by pytorch/pytorch#138503. Until then, the failing tests are added to the xfail set. Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config. - "ElementwiseRreluTrainModule_basic" - "ElementwiseRreluTrainStaticModule_basic" - "ElementwiseRreluWithNoiseTrainModule_basic" - "ElementwiseRreluWithNoiseTrainStaticModule_basic" This commit also updates the dtype check for the `aten.linear` op since the op now expects both the input tensors to have the same dtype. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
…tion in compile" Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 Pull Request resolved: #141867 Approved by: https://github.com/bdhirsh
…1867) Rebase-copy of long standing already approved PR pytorch#138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 Pull Request resolved: pytorch#141867 Approved by: https://github.com/bdhirsh
…1867) Rebase-copy of long standing already approved PR pytorch#138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 Pull Request resolved: pytorch#141867 Approved by: https://github.com/bdhirsh
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29. This commit also fixes the CI failure after this commit llvm@54d9e24 got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR. While exporting the fx_graph through fx_importer for `rrelu` and `rrelu_with_noise` op for train mode, it decomposes the `aten.rrelu_with_noise` op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325, resulting in the runtime failure. This issue would probably be fixed by pytorch/pytorch#138503. Until then, the failing tests are added to the xfail set. Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config. - "ElementwiseRreluTrainModule_basic" - "ElementwiseRreluTrainStaticModule_basic" - "ElementwiseRreluWithNoiseTrainModule_basic" - "ElementwiseRreluWithNoiseTrainStaticModule_basic" This commit also updates the dtype check for the `aten.linear` op since the op now expects both the input tensors to have the same dtype. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29. This commit also fixes the CI failure after this commit 54d9e24 got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR. While exporting the fx_graph through fx_importer for `rrelu` and `rrelu_with_noise` op for train mode, it decomposes the `aten.rrelu_with_noise` op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325, resulting in the runtime failure. This issue would probably be fixed by pytorch/pytorch#138503. Until then, the failing tests are added to the xfail set. Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config. - "ElementwiseRreluTrainModule_basic" - "ElementwiseRreluTrainStaticModule_basic" - "ElementwiseRreluWithNoiseTrainModule_basic" - "ElementwiseRreluWithNoiseTrainStaticModule_basic" This commit also updates the dtype check for the `aten.linear` op since the op now expects both the input tensors to have the same dtype. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Stack from ghstack (oldest at bottom):
Issues:
#136784
#135083
rrelu_with_noise mutates noise.
This was not reflected in schema.
1/ Changing schema that noise is mutated
2/ Turning on generation of functional variant of the op
3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema
4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
Testing:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov