[inductor] [cpp] fix the input contiguous check in max-autotune by chunyuan-w · Pull Request #135561 · pytorch/pytorch

chunyuan-w · 2024-09-10T03:10:11Z

Cherry-pick #134982 to the release/2.5 branch.
This is a critical correctness issue fix for inductor max-autotune on CPU, which is a new prototype feature that will be introduced in the PyTorch 2.5 release.

Description

Fixes the FP32 accuracy failure of resmlp_12_224 and BF16 accuracy failure of volo_d1_224 in timm.

In this PR, we check whether input is contiguous using the following way: If it has FixedLayout, we know the accurate strides. For FlexibleLayout, if its data is a ComputedBuffer, we could get the fill order of the buffer to decide whether it's contiguous. For the other cases, we won't use GEMM template as we can't infer whether it's contiguous.

Additional context

The current GEMM template only supports this case: input.get_stride()[-1] == 1. In resmlp_12_224, when we run into this check, the layout of input is a FlexibleLayout. The reason is that when realizing the input which is a View IR, the convert_to_reinterpret_view call fails:

pytorch/torch/_inductor/ir.py

Lines 4712 to 4715 in d14fe3f

    
           try: 
        
               return cls.convert_to_reinterpret_view(x) 
        
           except NotImplementedError: 
        
               pass

And it finally runs into this copy_input and returns a FlexibleLayout.

pytorch/torch/_inductor/ir.py

Line 4722 in d14fe3f

return cls.copy_input(x)

When checking its stride, this FlexibleLayout indeed satisfies input.get_stride()[-1] == 1 but it is later decided as a FixedLayout with size = (3072, 196), stride = (1, 3072), which is not supported by the GEMM template, thus causing accuracy issue in this model. The FlexibleLayout is converted to FixedLayout during CppPackedGemmTemplate.add_choices which calls slice_nd when rendering the kernel (slice_nd(X)). When creating the SliceView IR, as_storage_and_layout invokes decide_layout and converts it to a FixedLayout with size = (3072, 196), stride = (1, 3072).

Pull Request resolved: #134982
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn

pytorch-bot · 2024-09-10T03:10:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135561

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d6c6981 with merge base b7eb725 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
ModuleNotFoundError: No module named 'torch.version'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

chunyuan-w · 2024-09-10T07:23:28Z

Other PRs submitted to release/2.5 also meet the same CI failure.

chunyuan-w · 2024-09-11T01:18:26Z

@pytorchbot rebase

pytorchmergebot · 2024-09-11T01:19:56Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-09-11T01:20:00Z

Successfully rebased chunyuan/cherry-pick-134982 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout chunyuan/cherry-pick-134982 && git pull --rebase)

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 10, 2024

chunyuan-w changed the title ~~[inductor] [cpp] fix the input contiguous check in max-autotune (#134…~~ [inductor] [cpp] fix the input contiguous check in max-autotune Sep 10, 2024

pytorchbot added the open source label Sep 10, 2024

chunyuan-w marked this pull request as ready for review September 10, 2024 07:23

chunyuan-w requested a review from jgong5 September 10, 2024 07:23

pytorchmergebot force-pushed the chunyuan/cherry-pick-134982 branch from f09b719 to 404abcb Compare September 11, 2024 01:20

pytorchmergebot requested review from a team, aaronenyeshi, angelayi, avikchaudhuri, eqy, jerryzh168, justinchuby, kulinseth, malfet, mikaylagawarecki, shubhambhokare1, sraikund16, syed-ahmed, titaiwangms, tugsbayasgalan, wschin, ydwu4 and zhxchen17 as code owners September 11, 2024 01:20

chunyuan-w removed the open source label Sep 11, 2024

chunyuan-w mentioned this pull request Sep 13, 2024

[v.2.5.0] Release Tracker #135522

Closed

kit1980 approved these changes Sep 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] [cpp] fix the input contiguous check in max-autotune#135561

[inductor] [cpp] fix the input contiguous check in max-autotune#135561
kit1980 merged 1 commit intopytorch:release/2.5from
chunyuan-w:chunyuan/cherry-pick-134982

chunyuan-w commented Sep 10, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading

Uh oh!

chunyuan-w commented Sep 10, 2024

Uh oh!

chunyuan-w commented Sep 11, 2024

Uh oh!

pytorchmergebot commented Sep 11, 2024

Uh oh!

pytorchmergebot commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	try:
	return cls.convert_to_reinterpret_view(x)
	except NotImplementedError:
	pass

Conversation

chunyuan-w commented Sep 10, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context

Uh oh!

pytorch-bot bot commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135561

❌ 1 New Failure

Uh oh!

chunyuan-w commented Sep 10, 2024

Uh oh!

chunyuan-w commented Sep 11, 2024

Uh oh!

pytorchmergebot commented Sep 11, 2024

Uh oh!

pytorchmergebot commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chunyuan-w commented Sep 10, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading