[Inductor] support masked vectorization for the tail_loop for dynamic shapes by jiayisunx · Pull Request #131745 · pytorch/pytorch

jiayisunx · 2024-07-25T06:08:37Z

Stack from ghstack (oldest at bottom):

[Inductor] support masked vectorization for the tail_loop for integer datatypes and bool datatype #128802
-> [Inductor] support masked vectorization for the tail_loop for dynamic shapes #131745

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-07-25T06:08:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131745

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fcfebe5 with merge base 1754850 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

jgong5 · 2024-08-29T02:41:25Z

-template <typename T, int M, int N,
-          typename std::enable_if_t<std::is_same<T, BFloat16>::value && ((M < 32 && M != 16) || (N < 32 && N != 16)), int> = 0>
-inline void transpose_mxn(const BFloat16* src, int64_t ld_src, BFloat16* dst, int64_t ld_dst) {
+inline void transpose_mxn<BFloat16>(const BFloat16* src, int64_t ld_src, BFloat16* dst, int64_t ld_dst, int M, int N) {


I'm concerned about the perf impact. Originally, M and N are template args and compile-time constants.

I will check it with TAS.

No performance regression in TAS, and we compared assembly code before and after PR, it doesn't seem to have performance impact.

Thanks! So perhaps we don't have to keep the version with M and N as template args.

leslie-fang-intel

LGTM, please kindly address Jiong's comment.

[ghstack-poisoned]

jansel

Failing tests?

[ghstack-poisoned]

jiayisunx · 2024-09-04T12:07:32Z

Failing tests?

A bug introduced by rebase, I have fixed it, please help to review this PR again, thanks!

jgong5 · 2024-09-05T02:02:43Z

+            steps_str = (
+                f"{self.var}+=({cexpr_index(self.steps)} == 0 ? "
+                f"1 : {cexpr_index(self.steps)})"


Can you add comment on why we need this trick here?

Added, thanks!

jgong5 · 2024-09-05T02:14:05Z

-template <typename T, int M, int N,
-          typename std::enable_if_t<std::is_same<T, BFloat16>::value && ((M < 32 && M != 16) || (N < 32 && N != 16)), int> = 0>
-inline void transpose_mxn(const BFloat16* src, int64_t ld_src, BFloat16* dst, int64_t ld_dst) {
+inline void transpose_mxn<BFloat16>(const BFloat16* src, int64_t ld_src, BFloat16* dst, int64_t ld_dst, int M, int N) {


Thanks! So perhaps we don't have to keep the version with M and N as template args.

[ghstack-poisoned]

jiayisunx · 2024-09-05T06:10:31Z

@pytorchbot merge

pytorchmergebot · 2024-09-05T06:12:15Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Recent PR: #131745 bring new VLA logical in cpp codegen. And it will raise build fail error on MSVC and error code is `Compiler Error C2131`: https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2131?view=msvc-170 reproduce UT: ```cmd pytest test\inductor\test_torchinductor_dynamic_shapes.py -v -k test_large_block_sizes_dynamic_shapes_cpu ``` Original generated code: ```c++ alignas(16) float tmp1[static_cast<int64_t>(((-256LL)*(c10::div_floor_integer(static_cast<int64_t>(ks1), static_cast<int64_t>(16LL)))) + (16LL*ks1))]; ``` Changes: allocate a large-enough fixed-sized buffer. New genarated code: ```c++ alignas(16) float tmp1[16*16]; ``` Pull Request resolved: #135307 Approved by: https://github.com/jgong5, https://github.com/jansel

… shapes (pytorch#131745) Pull Request resolved: pytorch#131745 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel

Recent PR: pytorch#131745 bring new VLA logical in cpp codegen. And it will raise build fail error on MSVC and error code is `Compiler Error C2131`: https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2131?view=msvc-170 reproduce UT: ```cmd pytest test\inductor\test_torchinductor_dynamic_shapes.py -v -k test_large_block_sizes_dynamic_shapes_cpu ``` Original generated code: ```c++ alignas(16) float tmp1[static_cast<int64_t>(((-256LL)*(c10::div_floor_integer(static_cast<int64_t>(ks1), static_cast<int64_t>(16LL)))) + (16LL*ks1))]; ``` Changes: allocate a large-enough fixed-sized buffer. New genarated code: ```c++ alignas(16) float tmp1[16*16]; ``` Pull Request resolved: pytorch#135307 Approved by: https://github.com/jgong5, https://github.com/jansel

Update

55766ac

[ghstack-poisoned]

jiayisunx mentioned this pull request Jul 25, 2024

[Inductor] support masked vectorization for the tail_loop #126526

Closed

pytorch-bot Bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Jul 25, 2024

jiayisunx marked this pull request as draft July 25, 2024 06:11

pytorchbot added the open source label Jul 25, 2024

jiayisunx added 19 commits July 24, 2024 23:31

Update

a72a81b

[ghstack-poisoned]

Update

9b2ee93

[ghstack-poisoned]

Update

1e37bcd

[ghstack-poisoned]

Update

c64dcc6

[ghstack-poisoned]

Update

89b6b27

[ghstack-poisoned]

Update

423f94e

[ghstack-poisoned]

Update

bc376f3

[ghstack-poisoned]

Update

e6ec444

[ghstack-poisoned]

Update

d2f2ddb

[ghstack-poisoned]

Update

a563c1c

[ghstack-poisoned]

Update

80e03d6

[ghstack-poisoned]

Update

ea8a871

[ghstack-poisoned]

Update

3db8d56

[ghstack-poisoned]

Update

50ae6db

[ghstack-poisoned]

Update

9c0dc58

[ghstack-poisoned]

Update

64ad42a

[ghstack-poisoned]

Update

9d6c37b

[ghstack-poisoned]

Update

e0e95cd

[ghstack-poisoned]

Update

f7299d0

[ghstack-poisoned]

jgong5 requested changes Aug 29, 2024

View reviewed changes

leslie-fang-intel approved these changes Aug 29, 2024

View reviewed changes

jiayisunx requested a review from jansel September 3, 2024 07:23

jiayisunx added 2 commits September 3, 2024 18:11

Update

c58b2ad

[ghstack-poisoned]

Update

c088cad

[ghstack-poisoned]

jiayisunx added the topic: not user facing topic category label Sep 4, 2024

jansel requested changes Sep 4, 2024

View reviewed changes

jiayisunx added 2 commits September 3, 2024 22:26

Update

d046d55

[ghstack-poisoned]

Update

839b37d

[ghstack-poisoned]

jiayisunx requested a review from jgong5 September 4, 2024 07:57

jiayisunx requested a review from jansel September 4, 2024 12:07

jansel approved these changes Sep 4, 2024

View reviewed changes

jgong5 approved these changes Sep 5, 2024

View reviewed changes

Update

fcfebe5

[ghstack-poisoned]

pytorchmergebot added the merging label Sep 5, 2024

pytorchmergebot closed this in 05feb6e Sep 5, 2024

pytorchmergebot added Merged and removed merging labels Sep 5, 2024

xuhancn mentioned this pull request Sep 5, 2024

[inductor] Fix gen_transposed_tile_load_store - need_define VLA build issue on MSVC #135230

Closed

jiayisunx mentioned this pull request Sep 6, 2024

[inductor] Fix gen_transposed_tile_load_store #135307

Closed

github-actions Bot deleted the gh/jiayisunx/16/head branch October 6, 2024 02:08

Conversation

jiayisunx commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131745

✅ No Failures

Uh oh!

jgong5 Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

jiayisunx Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

jiayisunx Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

jgong5 Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

leslie-fang-intel left a comment

Choose a reason for hiding this comment

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

jiayisunx commented Sep 4, 2024

Uh oh!

jgong5 Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

jiayisunx Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

jgong5 Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

jiayisunx commented Sep 5, 2024

Uh oh!

pytorchmergebot commented Sep 5, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jiayisunx commented Jul 25, 2024 •

edited

Loading

pytorch-bot Bot commented Jul 25, 2024 •

edited

Loading