Skip to content

[inductor] [cpp] fix the input contiguous check in max-autotune#135561

Merged
kit1980 merged 1 commit intopytorch:release/2.5from
chunyuan-w:chunyuan/cherry-pick-134982
Sep 20, 2024
Merged

[inductor] [cpp] fix the input contiguous check in max-autotune#135561
kit1980 merged 1 commit intopytorch:release/2.5from
chunyuan-w:chunyuan/cherry-pick-134982

Conversation

@chunyuan-w
Copy link
Collaborator

@chunyuan-w chunyuan-w commented Sep 10, 2024

Cherry-pick #134982 to the release/2.5 branch.
This is a critical correctness issue fix for inductor max-autotune on CPU, which is a new prototype feature that will be introduced in the PyTorch 2.5 release.

Description

Fixes the FP32 accuracy failure of resmlp_12_224 and BF16 accuracy failure of volo_d1_224 in timm.

In this PR, we check whether input is contiguous using the following way: If it has FixedLayout, we know the accurate strides. For FlexibleLayout, if its data is a ComputedBuffer, we could get the fill order of the buffer to decide whether it's contiguous. For the other cases, we won't use GEMM template as we can't infer whether it's contiguous.

Additional context

The current GEMM template only supports this case: input.get_stride()[-1] == 1. In resmlp_12_224, when we run into this check, the layout of input is a FlexibleLayout. The reason is that when realizing the input which is a View IR, the convert_to_reinterpret_view call fails:

pytorch/torch/_inductor/ir.py

Lines 4712 to 4715 in d14fe3f

try:
return cls.convert_to_reinterpret_view(x)
except NotImplementedError:
pass

And it finally runs into this copy_input and returns a FlexibleLayout.

return cls.copy_input(x)

When checking its stride, this FlexibleLayout indeed satisfies input.get_stride()[-1] == 1 but it is later decided as a FixedLayout with size = (3072, 196), stride = (1, 3072), which is not supported by the GEMM template, thus causing accuracy issue in this model. The FlexibleLayout is converted to FixedLayout during CppPackedGemmTemplate.add_choices which calls slice_nd when rendering the kernel (slice_nd(X)). When creating the SliceView IR, as_storage_and_layout invokes decide_layout and converts it to a FixedLayout with size = (3072, 196), stride = (1, 3072).

Pull Request resolved: #134982
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135561

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d6c6981 with merge base b7eb725 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@chunyuan-w chunyuan-w changed the title [inductor] [cpp] fix the input contiguous check in max-autotune (#134… [inductor] [cpp] fix the input contiguous check in max-autotune Sep 10, 2024
@chunyuan-w
Copy link
Collaborator Author

Other PRs submitted to release/2.5 also meet the same CI failure.

@chunyuan-w chunyuan-w marked this pull request as ready for review September 10, 2024 07:23
@chunyuan-w chunyuan-w requested a review from jgong5 September 10, 2024 07:23
@chunyuan-w
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased chunyuan/cherry-pick-134982 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout chunyuan/cherry-pick-134982 && git pull --rebase)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants