[aotd] capture rrelu_with_noise noise mutation by IvanKobzarev · Pull Request #138503 · pytorch/pytorch

IvanKobzarev · 2024-10-21T20:19:52Z

Stack from ghstack (oldest at bottom):

-> [aotd] capture rrelu_with_noise noise mutation #138503

Issues:
#136784
#135083

rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:

because rrelu_with_noise is not compatible as a mutation op.
rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:

python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-10-21T20:19:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138503

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 10c764e with merge base 5f287df ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge) (gh)
/var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:458:14: Compiling torch_xla/csrc/runtime/xla_util_test.cc failed: (Exit 1): gcc failed: error executing command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 233 arguments skipped)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

periodic / linux-focal-rocm6.2-py3.10 / test (distributed, 3, 3, linux.rocm.gpu, module:rocm, oncall:distributed) (gh) (trunk failure)
distributed/test_c10d_logger.py::C10dErrorLoggerTest::test_exception_logger

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: a7017a8 Pull Request resolved: #138503

github-actions · 2024-10-21T20:24:55Z

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.

Caused by:

aten/src/ATen/native/native_functions.yaml

[ghstack-poisoned]

ghstack-source-id: 80759a8 Pull Request resolved: #138503

Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` [ghstack-poisoned]

Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 8a433ff Pull Request resolved: #138503

bdhirsh · 2024-10-24T15:35:25Z

+  auto [noise_value, noise_bdim] = unwrapTensorAtLevel(noise, cur_level);
+  TORCH_CHECK(!noise_bdim.has_value(), "vmap: Attempted to vmap over 'noise' in torch.rrelu_with_noise. This is not supported.");
+  auto res = rrelu_with_noise_batch_rule(self_value, self_bdim, noise_value, noise_bdim, lower, upper, training, std::move(generator));
+  return makeBatched(std::get<0>(res), std::get<1>(res), cur_level);


cc @zou3519 does this updated vmap rule look ok?

Why did we need to change the vmap rule?

Okay I see the mutation

bdhirsh

lgtm, although:

(1) I'll wait for Richard to comment on the batching rule

(2) there's an XLA failure, so you're probably going to need an XLA-side patch. @qihqi @miladm who from the XLA side can help coordinate?

From the error, it looks like you need to remove const from the noise argument in the XLA lowering here: https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp#L3022

IvanKobzarev · 2024-10-24T18:21:05Z

xla PR pytorch/xla#8309
b2430c10302a82bc56060f0f6944cf28afabd1a7

Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: ff5acf0 Pull Request resolved: #138503

zou3519 · 2024-10-29T20:49:56Z

+  return std::make_tuple(ret, 0, noise_, 0);
+}
+
+static Tensor rrelu_with_noise_batch(


Is this operation deterministic?

No, the noise is random uniform on every call.

zou3519

batching rule looks fine to me

This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29. This commit also fixes the CI failure after this commit 54d9e24 got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR. While exporting the fx_graph through fx_importer for `rrelu` and `rrelu_with_noise` op for train mode, it decomposes the `aten.rrelu_with_noise` op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325, resulting in the runtime failure. This issue would probably be fixed by pytorch/pytorch#138503. Until then, the failing tests are added to the xfail set. Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config. - "ElementwiseRreluTrainModule_basic" - "ElementwiseRreluTrainStaticModule_basic" - "ElementwiseRreluWithNoiseTrainModule_basic" - "ElementwiseRreluWithNoiseTrainStaticModule_basic" This commit also updates the dtype check for the `aten.linear` op since the op now expects both the input tensors to have the same dtype. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>

Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: f0421a9 Pull Request resolved: #138503

Issues: #136784 #135083 rrelu_with_noise mutates noise. This was not reflected in schema. 1/ Changing schema that noise is mutated 2/ Turning on generation of functional variant of the op 3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema 4/ Removing rrelu_with_noise from BatchRulesBinaryOps: - because rrelu_with_noise is not compatible as a mutation op. - rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type. Testing: ``` python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 4d16e93 Pull Request resolved: #138503

Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

…tion in compile" Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 Pull Request resolved: #141867 Approved by: https://github.com/bdhirsh

…1867) Rebase-copy of long standing already approved PR pytorch#138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: pytorch/xla#8363 Pull Request resolved: pytorch#141867 Approved by: https://github.com/bdhirsh

github-actions · 2025-01-07T19:33:58Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29. This commit also fixes the CI failure after this commit llvm@54d9e24 got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR. While exporting the fx_graph through fx_importer for `rrelu` and `rrelu_with_noise` op for train mode, it decomposes the `aten.rrelu_with_noise` op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325, resulting in the runtime failure. This issue would probably be fixed by pytorch/pytorch#138503. Until then, the failing tests are added to the xfail set. Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config. - "ElementwiseRreluTrainModule_basic" - "ElementwiseRreluTrainStaticModule_basic" - "ElementwiseRreluWithNoiseTrainModule_basic" - "ElementwiseRreluWithNoiseTrainStaticModule_basic" This commit also updates the dtype check for the `aten.linear` op since the op now expects both the input tensors to have the same dtype. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>

This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29. This commit also fixes the CI failure after this commit 54d9e24 got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR. While exporting the fx_graph through fx_importer for `rrelu` and `rrelu_with_noise` op for train mode, it decomposes the `aten.rrelu_with_noise` op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325, resulting in the runtime failure. This issue would probably be fixed by pytorch/pytorch#138503. Until then, the failing tests are added to the xfail set. Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config. - "ElementwiseRreluTrainModule_basic" - "ElementwiseRreluTrainStaticModule_basic" - "ElementwiseRreluWithNoiseTrainModule_basic" - "ElementwiseRreluWithNoiseTrainStaticModule_basic" This commit also updates the dtype check for the `aten.linear` op since the op now expects both the input tensors to have the same dtype. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>

[aotd] capture rrelu_with_noise noise mutation

d8d47e9

[ghstack-poisoned]

IvanKobzarev requested review from albanD and soulitzer as code owners October 21, 2024 20:19

IvanKobzarev added a commit that referenced this pull request Oct 21, 2024

[aotd] capture rrelu_with_noise noise mutation

e98a135

ghstack-source-id: a7017a8 Pull Request resolved: #138503

Update on "[aotd] capture rrelu_with_noise noise mutation"

5ae68eb

[ghstack-poisoned]

IvanKobzarev requested review from Chillee and ezyang as code owners October 22, 2024 14:19

IvanKobzarev added a commit that referenced this pull request Oct 22, 2024

[aotd] capture rrelu_with_noise noise mutation

cb538b6

ghstack-source-id: 80759a8 Pull Request resolved: #138503

pytorch-bot Bot added the ciflow/inductor label Oct 22, 2024

IvanKobzarev added the topic: not user facing topic category label Oct 22, 2024

IvanKobzarev requested review from eqy and syed-ahmed as code owners October 22, 2024 19:13

pytorch-bot Bot added the module: inductor label Oct 23, 2024

IvanKobzarev requested a review from larryliu0820 as a code owner October 23, 2024 21:11

IvanKobzarev added a commit that referenced this pull request Oct 23, 2024

[aotd] capture rrelu_with_noise noise mutation

a6ee58f

ghstack-source-id: 8a433ff Pull Request resolved: #138503

bdhirsh reviewed Oct 24, 2024

View reviewed changes

bdhirsh approved these changes Oct 24, 2024

View reviewed changes

IvanKobzarev mentioned this pull request Oct 24, 2024

Align rrelu_with_noise schema to reflect noise mutation pytorch/xla#8309

Closed

IvanKobzarev added a commit that referenced this pull request Oct 28, 2024

[aotd] capture rrelu_with_noise noise mutation

6a1a6e8

ghstack-source-id: ff5acf0 Pull Request resolved: #138503

zou3519 reviewed Oct 29, 2024

View reviewed changes

zou3519 approved these changes Oct 29, 2024

View reviewed changes

vivekkhandelwal1 mentioned this pull request Oct 30, 2024

build: manually update PyTorch version and fix CI failure llvm/torch-mlir#3830

Merged

IvanKobzarev added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Nov 7, 2024

IvanKobzarev added a commit that referenced this pull request Nov 7, 2024

[aotd] capture rrelu_with_noise noise mutation

3a88413

ghstack-source-id: f0421a9 Pull Request resolved: #138503

IvanKobzarev mentioned this pull request Nov 7, 2024

Align rrelu_with_noise schema to reflect noise mutation [branch] pytorch/xla#8363

Merged

IvanKobzarev added a commit that referenced this pull request Nov 8, 2024

[aotd] capture rrelu_with_noise noise mutation

0c09eb8

ghstack-source-id: 4d16e93 Pull Request resolved: #138503

IvanKobzarev mentioned this pull request Dec 2, 2024

[aotd] capture rrelu_with_noise noise mutation in compile #141867

Closed

github-actions Bot added the Stale label Jan 7, 2025

github-actions Bot closed this Feb 6, 2025

github-actions Bot deleted the gh/IvanKobzarev/78/head branch March 9, 2025 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aotd] capture rrelu_with_noise noise mutation#138503

[aotd] capture rrelu_with_noise noise mutation#138503
IvanKobzarev wants to merge 13 commits intogh/IvanKobzarev/78/basefrom
gh/IvanKobzarev/78/head

IvanKobzarev commented Oct 21, 2024 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Oct 21, 2024 •

edited

Loading

Uh oh!

github-actions Bot commented Oct 21, 2024

Uh oh!

bdhirsh Oct 24, 2024

Uh oh!

zou3519 Oct 29, 2024

Uh oh!

zou3519 Oct 29, 2024

Uh oh!

bdhirsh left a comment

Uh oh!

IvanKobzarev commented Oct 24, 2024

Uh oh!

zou3519 Oct 29, 2024

Uh oh!

IvanKobzarev Oct 29, 2024

Uh oh!

zou3519 left a comment

Uh oh!

github-actions Bot commented Jan 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

IvanKobzarev commented Oct 21, 2024 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138503

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

github-actions Bot commented Oct 21, 2024

Attention! native_functions.yaml was changed

Uh oh!

bdhirsh Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

zou3519 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

zou3519 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev commented Oct 24, 2024

Uh oh!

zou3519 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jan 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IvanKobzarev commented Oct 21, 2024 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Oct 21, 2024 •

edited

Loading