Skip to content

[aotd] capture rrelu_with_noise noise mutation#138503

Closed
IvanKobzarev wants to merge 13 commits intogh/IvanKobzarev/78/basefrom
gh/IvanKobzarev/78/head
Closed

[aotd] capture rrelu_with_noise noise mutation#138503
IvanKobzarev wants to merge 13 commits intogh/IvanKobzarev/78/basefrom
gh/IvanKobzarev/78/head

Conversation

@IvanKobzarev
Copy link
Copy Markdown
Contributor

@IvanKobzarev IvanKobzarev commented Oct 21, 2024

Stack from ghstack (oldest at bottom):

Issues:
#136784
#135083

rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:

  • because rrelu_with_noise is not compatible as a mutation op.
  • rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:

python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Oct 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138503

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 10c764e with merge base 5f287df (image):

NEW FAILURE - The following job has failed:

  • pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge) (gh)
    /var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:458:14: Compiling torch_xla/csrc/runtime/xla_util_test.cc failed: (Exit 1): gcc failed: error executing command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 233 arguments skipped)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

IvanKobzarev added a commit that referenced this pull request Oct 21, 2024
ghstack-source-id: a7017a8
Pull Request resolved: #138503
@github-actions
Copy link
Copy Markdown
Contributor

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

IvanKobzarev added a commit that referenced this pull request Oct 22, 2024
ghstack-source-id: 80759a8
Pull Request resolved: #138503
@IvanKobzarev IvanKobzarev added the topic: not user facing topic category label Oct 22, 2024
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


[ghstack-poisoned]
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


[ghstack-poisoned]
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 23, 2024
ghstack-source-id: 8a433ff
Pull Request resolved: #138503
auto [noise_value, noise_bdim] = unwrapTensorAtLevel(noise, cur_level);
TORCH_CHECK(!noise_bdim.has_value(), "vmap: Attempted to vmap over 'noise' in torch.rrelu_with_noise. This is not supported.");
auto res = rrelu_with_noise_batch_rule(self_value, self_bdim, noise_value, noise_bdim, lower, upper, training, std::move(generator));
return makeBatched(std::get<0>(res), std::get<1>(res), cur_level);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zou3519 does this updated vmap rule look ok?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we need to change the vmap rule?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see the mutation

Copy link
Copy Markdown
Collaborator

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, although:

(1) I'll wait for Richard to comment on the batching rule

(2) there's an XLA failure, so you're probably going to need an XLA-side patch. @qihqi @miladm who from the XLA side can help coordinate?

From the error, it looks like you need to remove const from the noise argument in the XLA lowering here: https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp#L3022

@IvanKobzarev
Copy link
Copy Markdown
Contributor Author

xla PR pytorch/xla#8309
b2430c10302a82bc56060f0f6944cf28afabd1a7

Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 28, 2024
ghstack-source-id: ff5acf0
Pull Request resolved: #138503
return std::make_tuple(ret, 0, noise_, 0);
}

static Tensor rrelu_with_noise_batch(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this operation deterministic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the noise is random uniform on every call.

Copy link
Copy Markdown
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batching rule looks fine to me

vivekkhandelwal1 added a commit to llvm/torch-mlir that referenced this pull request Oct 30, 2024
This commit sets the PyTorch and TorchVision version to nightly release
2024-10-29.

This commit also fixes the CI failure after this commit
54d9e24
got merged. The issue was that the CI checks in the PR were run before
the previous roll pytorch update but the PR was actually merged after
the roll pytorch update. Hence, the failure was not caught before
merging the PR.

While exporting the fx_graph through fx_importer for `rrelu` and
`rrelu_with_noise` op for train mode, it decomposes the
`aten.rrelu_with_noise` op based on the PyTorch decomposition which is
the default behavior. However, the decomposition contains an input
mutation specifically here
https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325,
resulting in the runtime failure. This issue would probably be fixed by
pytorch/pytorch#138503. Until then, the failing
tests are added to the xfail set.

Also, after the roll pytorch update following tests started passing for
fx_importer, and fx_importer_stablehlo config.

- "ElementwiseRreluTrainModule_basic"
- "ElementwiseRreluTrainStaticModule_basic"
- "ElementwiseRreluWithNoiseTrainModule_basic"
- "ElementwiseRreluWithNoiseTrainStaticModule_basic"

This commit also updates the dtype check for the `aten.linear` op since
the op now expects both the input tensors to have the same dtype.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
@IvanKobzarev IvanKobzarev added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Nov 7, 2024
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
Issues:
#136784
#135083


rrelu_with_noise mutates noise.
This was not reflected in schema.

1/ Changing schema that noise is mutated

2/ Turning on generation of functional variant of the op

3/ Adding derivatives rule for rrelu_with_noise_functional, as it has different schema

4/ Removing rrelu_with_noise from BatchRulesBinaryOps:
- because rrelu_with_noise is not compatible as a mutation op.
- rrelu_with_noise_functional returns Tuple[Tensor, Tensor], while BatchRuleBinaryOps macroses support only Tensor return type.

Testing:
```
python test/functorch/test_aotdispatch.py -k rrelu_noise_mutation
```


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Nov 8, 2024
ghstack-source-id: 4d16e93
Pull Request resolved: #138503
IvanKobzarev added a commit that referenced this pull request Dec 2, 2024
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues.


Got a new  PR with the same content.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Dec 2, 2024
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues.


Got a new  PR with the same content (ghstack checkout was failing due to changed submodules)



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Dec 3, 2024
…tion in compile"


Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues.


Got a new  PR with the same content (ghstack checkout was failing due to changed submodules)

Corresponding xla PR:
pytorch/xla#8363

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Dec 3, 2024
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues.


Got a new  PR with the same content (ghstack checkout was failing due to changed submodules)

Corresponding xla PR:
pytorch/xla#8363

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Dec 4, 2024
Rebase-copy of long standing already approved PR #138503 that was blocked on landing by xla build issues.

Got a new  PR with the same content (ghstack checkout was failing due to changed submodules)

Corresponding xla PR:
pytorch/xla#8363

Pull Request resolved: #141867
Approved by: https://github.com/bdhirsh
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…1867)

Rebase-copy of long standing already approved PR pytorch#138503 that was blocked on landing by xla build issues.

Got a new  PR with the same content (ghstack checkout was failing due to changed submodules)

Corresponding xla PR:
pytorch/xla#8363

Pull Request resolved: pytorch#141867
Approved by: https://github.com/bdhirsh
AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
…1867)

Rebase-copy of long standing already approved PR pytorch#138503 that was blocked on landing by xla build issues.

Got a new  PR with the same content (ghstack checkout was failing due to changed submodules)

Corresponding xla PR:
pytorch/xla#8363

Pull Request resolved: pytorch#141867
Approved by: https://github.com/bdhirsh
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 7, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the Stale label Jan 7, 2025
mgehre-amd pushed a commit to Xilinx/torch-mlir that referenced this pull request Jan 16, 2025
This commit sets the PyTorch and TorchVision version to nightly release
2024-10-29.

This commit also fixes the CI failure after this commit
llvm@54d9e24
got merged. The issue was that the CI checks in the PR were run before
the previous roll pytorch update but the PR was actually merged after
the roll pytorch update. Hence, the failure was not caught before
merging the PR.

While exporting the fx_graph through fx_importer for `rrelu` and
`rrelu_with_noise` op for train mode, it decomposes the
`aten.rrelu_with_noise` op based on the PyTorch decomposition which is
the default behavior. However, the decomposition contains an input
mutation specifically here
https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325,
resulting in the runtime failure. This issue would probably be fixed by
pytorch/pytorch#138503. Until then, the failing
tests are added to the xfail set.

Also, after the roll pytorch update following tests started passing for
fx_importer, and fx_importer_stablehlo config.

- "ElementwiseRreluTrainModule_basic"
- "ElementwiseRreluTrainStaticModule_basic"
- "ElementwiseRreluWithNoiseTrainModule_basic"
- "ElementwiseRreluWithNoiseTrainStaticModule_basic"

This commit also updates the dtype check for the `aten.linear` op since
the op now expects both the input tensors to have the same dtype.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
@github-actions github-actions Bot closed this Feb 6, 2025
TimAtGoogle pushed a commit to llvm/torch-mlir that referenced this pull request Feb 12, 2025
This commit sets the PyTorch and TorchVision version to nightly release
2024-10-29.

This commit also fixes the CI failure after this commit
54d9e24
got merged. The issue was that the CI checks in the PR were run before
the previous roll pytorch update but the PR was actually merged after
the roll pytorch update. Hence, the failure was not caught before
merging the PR.

While exporting the fx_graph through fx_importer for `rrelu` and
`rrelu_with_noise` op for train mode, it decomposes the
`aten.rrelu_with_noise` op based on the PyTorch decomposition which is
the default behavior. However, the decomposition contains an input
mutation specifically here
https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325,
resulting in the runtime failure. This issue would probably be fixed by
pytorch/pytorch#138503. Until then, the failing
tests are added to the xfail set.

Also, after the roll pytorch update following tests started passing for
fx_importer, and fx_importer_stablehlo config.

- "ElementwiseRreluTrainModule_basic"
- "ElementwiseRreluTrainStaticModule_basic"
- "ElementwiseRreluWithNoiseTrainModule_basic"
- "ElementwiseRreluWithNoiseTrainStaticModule_basic"

This commit also updates the dtype check for the `aten.linear` op since
the op now expects both the input tensors to have the same dtype.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
@github-actions github-actions Bot deleted the gh/IvanKobzarev/78/head branch March 9, 2025 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR module: inductor Stale topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants