Skip to content

build: manually update PyTorch version and fix CI failure#3830

Merged
vivekkhandelwal1 merged 4 commits intollvm:mainfrom
vivekkhandelwal1:roll-pytorch
Oct 30, 2024
Merged

build: manually update PyTorch version and fix CI failure#3830
vivekkhandelwal1 merged 4 commits intollvm:mainfrom
vivekkhandelwal1:roll-pytorch

Conversation

@vivekkhandelwal1
Copy link
Copy Markdown
Collaborator

@vivekkhandelwal1 vivekkhandelwal1 commented Oct 30, 2024

This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29.

This commit also fixes the CI failure after this commit 54d9e24 got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR.

While exporting the fx_graph through fx_importer for rrelu and rrelu_with_noise op for train mode, it decomposes the aten.rrelu_with_noise op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325, resulting in the runtime failure. This issue would probably be fixed by pytorch/pytorch#138503. Until then, the failing tests are added to the xfail set.

Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config.

  • "ElementwiseRreluTrainModule_basic"
  • "ElementwiseRreluTrainStaticModule_basic"
  • "ElementwiseRreluWithNoiseTrainModule_basic"
  • "ElementwiseRreluWithNoiseTrainStaticModule_basic"

This commit also updates the dtype check for the aten.linear op since the op now expects both the input tensors to have the same dtype.

Signed-Off By: Vivek Khandelwal vivekkhandelwal1424@gmail.com

@vivekkhandelwal1 vivekkhandelwal1 changed the title Roll PyTorch draft build: manually update PyTorch version and fix CI failure Oct 30, 2024
@vivekkhandelwal1 vivekkhandelwal1 marked this pull request as ready for review October 30, 2024 12:01
Copy link
Copy Markdown
Member

@pashu123 pashu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks Vivek!

@vivekkhandelwal1 vivekkhandelwal1 merged commit 16b3bd6 into llvm:main Oct 30, 2024
@vivekkhandelwal1 vivekkhandelwal1 deleted the roll-pytorch branch November 4, 2024 05:07
rahuls-cerebras added a commit that referenced this pull request Jan 3, 2025
mgehre-amd pushed a commit to Xilinx/torch-mlir that referenced this pull request Jan 16, 2025
This commit sets the PyTorch and TorchVision version to nightly release
2024-10-29.

This commit also fixes the CI failure after this commit
llvm@54d9e24
got merged. The issue was that the CI checks in the PR were run before
the previous roll pytorch update but the PR was actually merged after
the roll pytorch update. Hence, the failure was not caught before
merging the PR.

While exporting the fx_graph through fx_importer for `rrelu` and
`rrelu_with_noise` op for train mode, it decomposes the
`aten.rrelu_with_noise` op based on the PyTorch decomposition which is
the default behavior. However, the decomposition contains an input
mutation specifically here
https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325,
resulting in the runtime failure. This issue would probably be fixed by
pytorch/pytorch#138503. Until then, the failing
tests are added to the xfail set.

Also, after the roll pytorch update following tests started passing for
fx_importer, and fx_importer_stablehlo config.

- "ElementwiseRreluTrainModule_basic"
- "ElementwiseRreluTrainStaticModule_basic"
- "ElementwiseRreluWithNoiseTrainModule_basic"
- "ElementwiseRreluWithNoiseTrainStaticModule_basic"

This commit also updates the dtype check for the `aten.linear` op since
the op now expects both the input tensors to have the same dtype.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
TimAtGoogle pushed a commit that referenced this pull request Feb 12, 2025
This commit sets the PyTorch and TorchVision version to nightly release
2024-10-29.

This commit also fixes the CI failure after this commit
54d9e24
got merged. The issue was that the CI checks in the PR were run before
the previous roll pytorch update but the PR was actually merged after
the roll pytorch update. Hence, the failure was not caught before
merging the PR.

While exporting the fx_graph through fx_importer for `rrelu` and
`rrelu_with_noise` op for train mode, it decomposes the
`aten.rrelu_with_noise` op based on the PyTorch decomposition which is
the default behavior. However, the decomposition contains an input
mutation specifically here
https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325,
resulting in the runtime failure. This issue would probably be fixed by
pytorch/pytorch#138503. Until then, the failing
tests are added to the xfail set.

Also, after the roll pytorch update following tests started passing for
fx_importer, and fx_importer_stablehlo config.

- "ElementwiseRreluTrainModule_basic"
- "ElementwiseRreluTrainStaticModule_basic"
- "ElementwiseRreluWithNoiseTrainModule_basic"
- "ElementwiseRreluWithNoiseTrainStaticModule_basic"

This commit also updates the dtype check for the `aten.linear` op since
the op now expects both the input tensors to have the same dtype.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants