[inductor] fix crash issue when input is a view tensor by blzheng · Pull Request #90150 · pytorch/pytorch

blzheng · 2022-12-05T03:25:50Z

Fix the crash failure mentioned in #93460

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @Guobing-Chen @chunyuan-w @zhuhaozhe @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

pytorch-bot · 2022-12-05T03:25:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90150

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm jobs fail to access AMD apt repo

✅ No Failures

As of commit b8b02af:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jgong5 · 2022-12-10T03:33:26Z

test/inductor/test_torchinductor.py

                fn(a, b)
            assert "kernel_cpp_0" in (e.name for e in prof.profiler.function_events)

+        def test_input_is_view(self):


To be more accurate, it only happens with an in-place view op and should be fine non-place view, right? Naming it as test_input_is_inplace_view is more appropriate?

Right. This function name is updated.

torch/_inductor/graph.py

test/inductor/test_torchinductor.py

jgong5 · 2023-01-06T03:49:28Z

torch/_dynamo/variables/builder.py

+                not config.dynamic_shapes
+                and self.fake_tensor.shape != self.example.shape
+            ):
+                self.fake_tensor = self.fake_tensor.reshape(self.example.shape)


Does reshape always work?

I replaced the reshape with convert from real tensor in commit a6ac94d

jgong5 · 2023-01-06T03:49:49Z

torch/_dynamo/variables/builder.py

            )
+            # For inplace ops changing the input's shape (unsqueeze_)
+            if (
+                not config.dynamic_shapes


What happens with dynamic shapes?

I added support for dynamic shapes in 953e13c

jgong5 · 2023-01-17T02:43:14Z

test/inductor/test_torchinductor.py

To avoid code duplicate, consider to test dynamic_shapes is True and False inside same test code instead of specifying it as a decorator.

Updated in 5999b38

jgong5 · 2023-01-17T02:44:41Z

test/inductor/test_torchinductor.py

These are from another PR. I guess you need to rebase?

jgong5 · 2023-01-17T05:08:05Z

torch/_dynamo/variables/builder.py

+                self.fake_tensor = converter.from_real_tensor(
+                    self.fake_tensor.fake_mode, self.example
+                )
+            elif config.dynamic_shapes and self.fake_tensor.dim() != self.example.dim():


Is it a complete check to make sure there was inplace op on the inputs?

This check is updated in 5999b38

blzheng · 2023-01-17T08:00:11Z

@pytorchbot rebase

pytorchmergebot · 2023-01-17T08:01:58Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-01-17T08:02:01Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/90150/head returned non-zero exit code 1

Rebasing (1/9)
Auto-merging test/inductor/test_torchinductor.py
CONFLICT (content): Merge conflict in test/inductor/test_torchinductor.py
Auto-merging torch/_inductor/graph.py
Auto-merging torch/_inductor/scheduler.py
error: could not apply b296431bde... fix bug in hf_BigBird
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply b296431bde... fix bug in hf_BigBird

Raised by https://github.com/pytorch/pytorch/actions/runs/3937177663

enhance check condition

pytorchmergebot · 2023-02-03T00:33:36Z

Successfully rebased beilei/fix_hf_BigBird onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout beilei/fix_hf_BigBird && git pull --rebase)

…config was deleted by pytorch#93076

blzheng · 2023-02-03T04:52:28Z

@pytorchbot merge

pytorchmergebot · 2023-02-03T04:54:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Had to provide a merge conflict resolution due to conflicts with #94118 This reverts commit a71395d.

@jansel

…" (#94329) Had to provide a merge conflict resolution due to conflicts with #94118 This was causing issues with internal tests that look similar to: ``` in clone_preserve_strides x.size(), x.stride(), x.storage_offset() AttributeError: 'KeyedJaggedTensor' object has no attribute 'size' ``` See https://fburl.com/testinfra/nc0du2sp for more information This reverts commit #90150 @jansel can you help @blzheng with re-landing this as a co-development diff? Pull Request resolved: #94329 Approved by: https://github.com/jansel

seemethere · 2023-02-07T20:52:09Z

Sorry to revert again but it appears as though this introduces issues with users using TorchRec, see #94329 for more information

seemethere · 2023-02-07T20:53:14Z

torch/fx/passes/shape_prop.py

        """
-        return super().run(*args)
+        # clone inputs to avoid side effects caused by inplace ops during run_node
+        new_args = [torch._prims_common.clone_preserve_strides(x) for x in args]


This line in particular is problematic when attempting to use with torchrec's KeyedJaggedTensor, resulting in errors like:

in clone_preserve_strides x.size(), x.stride(), x.storage_offset() AttributeError: 'KeyedJaggedTensor' object has no attribute 'size'

Hi @seemethere I am curious about what case will trigger this issue. I added clone_preserve_strides in two places.

In this function. As the function description said, *args should be Tensor. I think Tensor must have attribute 'size', right?

def propagate(self, *args): """ Run `module` via interpretation and return the result and record the shape and type of each node. Args: *args (Tensor): the sample input. Returns: Any: The value returned from executing the Module """

Similarly, in function aot_dispatch_base, flat_args should be List[Tensor].

def aot_dispatch_base(flat_fn, flat_args: List[Tensor], aot_config: AOTConfig):

blzheng marked this pull request as draft December 5, 2022 03:25

github-actions bot added ciflow/inductor module: inductor labels Dec 5, 2022

pytorchbot added the open source label Dec 5, 2022

blzheng requested a review from jgong5 December 5, 2022 06:37

jgong5 requested changes Dec 10, 2022

View reviewed changes

jgong5 mentioned this pull request Dec 10, 2022

[Inductor] [CPU] Crash failure in torchbench model hf_BigBird #93460

Closed

jgong5 approved these changes Dec 10, 2022

View reviewed changes

blzheng force-pushed the beilei/fix_hf_BigBird branch from 1b99e85 to adcdd91 Compare December 12, 2022 01:28

blzheng marked this pull request as ready for review December 14, 2022 09:22

blzheng requested review from bdhirsh and jansel December 14, 2022 09:23

jansel requested changes Dec 14, 2022

View reviewed changes

torch/_inductor/graph.py Show resolved Hide resolved

test/inductor/test_torchinductor.py Outdated Show resolved Hide resolved

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 15, 2022

blzheng marked this pull request as draft December 15, 2022 01:29

blzheng force-pushed the beilei/fix_hf_BigBird branch from adcdd91 to 68894fd Compare January 4, 2023 05:28

github-actions bot added the module: dynamo label Jan 4, 2023

blzheng requested a review from jgong5 January 4, 2023 10:33

jgong5 reviewed Jan 6, 2023

View reviewed changes

atalman added this to the 2.0.0 milestone Jan 11, 2023

EikanWang mentioned this pull request Jan 13, 2023

[PT2.0 Feature Proposal] TorchInductor CPU FP32 Inference Optimization #92135

Closed

blzheng requested a review from jgong5 January 16, 2023 23:59

jgong5 reviewed Jan 17, 2023

View reviewed changes

blzheng force-pushed the beilei/fix_hf_BigBird branch from ee6e855 to 01c8f4c Compare January 17, 2023 07:57

blzheng force-pushed the beilei/fix_hf_BigBird branch 2 times, most recently from 953e13c to 5999b38 Compare January 17, 2023 08:16

blzheng added 13 commits February 3, 2023 00:33

fix lint

7edccb0

fix ut

1213f04

replace reshape with convert from real tensor

94443cd

support dynamic shapes for unsqueeze_

893f4d4

simplify code

233871e

enhance check condition

add stride check

29c8792

clone inputs to avoid side effects caused by inplace ops during run_node

c1a4612

fix spelling mistake

a121a0d

add ut and update metadata in codegen

3e0b841

add ut

eaf83b3

add mutate for ReinterpretView

c645276

rename mutate as codegen_reference_mutation

9d263ce

replace clone with clone_preserve_strides

8420392

pytorchmergebot force-pushed the beilei/fix_hf_BigBird branch from 012142e to 8420392 Compare February 3, 2023 00:33

remove torch._inductor.config.dynamic_shapes in testcases since this …

b8b02af

…config was deleted by pytorch#93076

pytorchmergebot closed this in a71395d Feb 3, 2023

seemethere added a commit that referenced this pull request Feb 7, 2023

Revert "[inductor] fix crash issue when input is a view tensor (#90150)"

d6b123c

Had to provide a merge conflict resolution due to conflicts with #94118 This reverts commit a71395d.

seemethere mentioned this pull request Feb 7, 2023

Revert "[inductor] fix crash issue when input is a view tensor (#90150)" #94329

Closed

seemethere reviewed Feb 7, 2023

View reviewed changes

blzheng requested a review from EikanWang February 9, 2023 03:25

blzheng reopened this Feb 9, 2023

blzheng marked this pull request as draft February 9, 2023 03:25

chunyuan-w mentioned this pull request Feb 9, 2023

[Inductor] [CPU] Accuracy failure in torchbench model hf_BigBird #94268

Closed

blzheng closed this Feb 20, 2023

Conversation

blzheng commented Dec 5, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90150

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blzheng commented Jan 17, 2023

Uh oh!

pytorchmergebot commented Jan 17, 2023

Uh oh!

pytorchmergebot commented Jan 17, 2023

Uh oh!

pytorchmergebot commented Feb 3, 2023

Uh oh!

blzheng commented Feb 3, 2023

Uh oh!

pytorchmergebot commented Feb 3, 2023

Merge started

Uh oh!

seemethere commented Feb 7, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

blzheng commented Dec 5, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 5, 2022 •

edited

Loading