Skip to content

[inductor] fix crash issue when input is a view tensor#90150

Closed
blzheng wants to merge 18 commits intopytorch:masterfrom
blzheng:beilei/fix_hf_BigBird
Closed

[inductor] fix crash issue when input is a view tensor#90150
blzheng wants to merge 18 commits intopytorch:masterfrom
blzheng:beilei/fix_hf_BigBird

Conversation

@blzheng
Copy link
Copy Markdown
Collaborator

@blzheng blzheng commented Dec 5, 2022

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Dec 5, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90150

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit b8b02af:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fn(a, b)
assert "kernel_cpp_0" in (e.name for e in prof.profiler.function_events)

def test_input_is_view(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more accurate, it only happens with an in-place view op and should be fine non-place view, right? Naming it as test_input_is_inplace_view is more appropriate?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This function name is updated.

@blzheng blzheng force-pushed the beilei/fix_hf_BigBird branch from 1b99e85 to adcdd91 Compare December 12, 2022 01:28
@blzheng blzheng marked this pull request as ready for review December 14, 2022 09:22
@blzheng blzheng requested review from bdhirsh and jansel December 14, 2022 09:23
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 15, 2022
@blzheng blzheng marked this pull request as draft December 15, 2022 01:29
@blzheng blzheng force-pushed the beilei/fix_hf_BigBird branch from adcdd91 to 68894fd Compare January 4, 2023 05:28
@blzheng blzheng requested a review from jgong5 January 4, 2023 10:33
not config.dynamic_shapes
and self.fake_tensor.shape != self.example.shape
):
self.fake_tensor = self.fake_tensor.reshape(self.example.shape)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does reshape always work?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the reshape with convert from real tensor in commit a6ac94d

)
# For inplace ops changing the input's shape (unsqueeze_)
if (
not config.dynamic_shapes
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with dynamic shapes?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for dynamic shapes in 953e13c

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid code duplicate, consider to test dynamic_shapes is True and False inside same test code instead of specifying it as a decorator.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 5999b38

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are from another PR. I guess you need to rebase?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased

self.fake_tensor = converter.from_real_tensor(
self.fake_tensor.fake_mode, self.example
)
elif config.dynamic_shapes and self.fake_tensor.dim() != self.example.dim():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a complete check to make sure there was inplace op on the inputs?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is updated in 5999b38

@blzheng blzheng force-pushed the beilei/fix_hf_BigBird branch from ee6e855 to 01c8f4c Compare January 17, 2023 07:57
@blzheng
Copy link
Copy Markdown
Collaborator Author

blzheng commented Jan 17, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/90150/head returned non-zero exit code 1

Rebasing (1/9)
Auto-merging test/inductor/test_torchinductor.py
CONFLICT (content): Merge conflict in test/inductor/test_torchinductor.py
Auto-merging torch/_inductor/graph.py
Auto-merging torch/_inductor/scheduler.py
error: could not apply b296431bde... fix bug in hf_BigBird
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply b296431bde... fix bug in hf_BigBird

Raised by https://github.com/pytorch/pytorch/actions/runs/3937177663

@blzheng blzheng force-pushed the beilei/fix_hf_BigBird branch 2 times, most recently from 953e13c to 5999b38 Compare January 17, 2023 08:16
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased beilei/fix_hf_BigBird onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout beilei/fix_hf_BigBird && git pull --rebase)

@blzheng
Copy link
Copy Markdown
Collaborator Author

blzheng commented Feb 3, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

seemethere added a commit that referenced this pull request Feb 7, 2023
Had to provide a merge conflict resolution due to conflicts with #94118

This reverts commit a71395d.
pytorchmergebot pushed a commit that referenced this pull request Feb 7, 2023
…" (#94329)

Had to provide a merge conflict resolution due to conflicts with #94118

This was causing issues with internal tests that look similar to:
```
in clone_preserve_strides
    x.size(), x.stride(), x.storage_offset()
AttributeError: 'KeyedJaggedTensor' object has no attribute 'size'
```

See https://fburl.com/testinfra/nc0du2sp for more information

This reverts commit #90150

@jansel can you help @blzheng with re-landing this as a co-development diff?

Pull Request resolved: #94329
Approved by: https://github.com/jansel
@seemethere
Copy link
Copy Markdown
Member

Sorry to revert again but it appears as though this introduces issues with users using TorchRec, see #94329 for more information

"""
return super().run(*args)
# clone inputs to avoid side effects caused by inplace ops during run_node
new_args = [torch._prims_common.clone_preserve_strides(x) for x in args]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line in particular is problematic when attempting to use with torchrec's KeyedJaggedTensor, resulting in errors like:

in clone_preserve_strides
    x.size(), x.stride(), x.storage_offset()
AttributeError: 'KeyedJaggedTensor' object has no attribute 'size'

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @seemethere I am curious about what case will trigger this issue. I added clone_preserve_strides in two places.

  1. In this function. As the function description said, *args should be Tensor. I think Tensor must have attribute 'size', right?
def propagate(self, *args):
        """
        Run `module` via interpretation and return the result and
        record the shape and type of each node.
        Args:
            *args (Tensor): the sample input.
        Returns:
            Any: The value returned from executing the Module
        """
  1. Similarly, in function aot_dispatch_base, flat_args should be List[Tensor].

def aot_dispatch_base(flat_fn, flat_args: List[Tensor], aot_config: AOTConfig):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: dynamo module: inductor open source release notes: AO frontend Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

9 participants