[cpu] Modify inductor opt flag --- ftree-loop-vectorize by Valentine233 · Pull Request #136827 · pytorch/pytorch

Valentine233 · 2024-09-27T02:45:43Z

Reopen #121782, as more optimizations have landed.

Fixes #115261, #113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

Validation on 3 benchmark suites

FP32

Outlier models (speedup<0.8, single socket): None.

BF16

Outlier models (speedup<0.8, single socket multi threads):

functorch_dp_cifar10 0.58
opacus_cifar10 0.57

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-09-27T02:45:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136827

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7f539db with merge base 565a794 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/test_c10d_ucc.py::DistributedDataParallelTest::test_save_load_checkpoint

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Valentine233 · 2024-10-09T05:48:03Z

The two outliers (functorch_dp_cifar10 and opacus_cifar10) are caused by the lack of int64 vectorization. With disabling config enable_tiling_heuristics, the outliers can disappear. cc @leslie-fang-intel @jiayisunx

Valentine233 · 2024-10-10T02:44:32Z

The two outliers are expected to be fixed by #136422. Would land this PR after that.

jgong5 · 2024-10-11T02:35:11Z

torch/_inductor/config.py

+    # Use ftree-loop-vectorize when compiling
+    enable_tree_loop_vec_opt_flag = (
+        os.environ.get("TORCHINDUCTOR_CPP_ENABLE_TREE_LOOP_VEC_OPT_FLAG", "0") == "1"
+    )


When is it recommended to turn on this flag? Please add a meaningful note here. Or, would it work if we always disable this compiler flag without an option to turn it on?

This is kept for the need of potential vectorization by this compiler flag. But as most of the vectorizations are supported now, I suppose that it is fine to remove the option.

Valentine233 · 2024-11-07T01:09:11Z

@jgong5 Please help review, the model regressions and CI failures are all resolved.

Valentine233 · 2024-11-07T02:42:32Z

@pytorchbot merge

pytorchmergebot · 2024-11-07T02:44:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Valentine233 · 2024-11-11T05:31:31Z

I think we need an "is gcc" check on this flag.

Thanks, Jason. I have modified and please help review again.

Valentine233 · 2024-11-12T01:19:04Z

@pytorchbot merge

pytorchmergebot · 2024-11-12T01:20:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Reopen pytorch#121782, as more optimizations have landed. Fixes pytorch#115261, pytorch#113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32 ![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e) Outlier models (speedup<0.8, single socket): None. #### BF16 ![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80) Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: pytorch#136827 Approved by: https://github.com/jansel, https://github.com/jgong5

…rch#136827)" This reverts commit cf0bb6c. Reverted pytorch#136827 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. See D65605094 for more details ([comment](pytorch#136827 (comment)))

Reopen pytorch#121782, as more optimizations have landed. Fixes pytorch#115261, pytorch#113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32 ![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e) Outlier models (speedup<0.8, single socket): None. #### BF16 ![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80) Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: pytorch#136827 Approved by: https://github.com/jansel, https://github.com/jgong5

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 27, 2024

pytorchbot added the open source label Sep 27, 2024

Valentine233 marked this pull request as draft September 27, 2024 02:57

Valentine233 added the topic: not user facing topic category label Sep 27, 2024

Valentine233 marked this pull request as ready for review October 10, 2024 05:19

Valentine233 requested a review from jgong5 October 10, 2024 05:19

Valentine233 added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 10, 2024

soulitzer requested review from eellison and jansel October 10, 2024 15:08

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 10, 2024

jgong5 reviewed Oct 11, 2024

View reviewed changes

jansel approved these changes Oct 11, 2024

View reviewed changes

Valentine233 force-pushed the tree_loop_vec_target branch 2 times, most recently from 4e9434c to 599ddb0 Compare October 18, 2024 01:03

eellison removed their request for review October 23, 2024 15:27

Valentine233 force-pushed the tree_loop_vec_target branch from 599ddb0 to fa384c1 Compare November 6, 2024 02:05

Valentine233 requested a review from jgong5 November 7, 2024 01:09

jgong5 approved these changes Nov 7, 2024

View reviewed changes

pytorchmergebot added the merging label Nov 7, 2024

pytorchmergebot added the Merged label Nov 7, 2024

pytorchmergebot closed this in cf0bb6c Nov 7, 2024

pytorchmergebot removed the merging label Nov 7, 2024

Valentine233 added 4 commits November 10, 2024 21:29

remove the flag option

224b466

try ci failiure

93b9614

update cpp flag

76cc562

add is_gcc check on flag

7f539db

Valentine233 force-pushed the tree_loop_vec_target branch from a98efcc to 7f539db Compare November 11, 2024 05:29

Valentine233 requested a review from jansel November 11, 2024 05:31

jansel approved these changes Nov 11, 2024

View reviewed changes

pytorchmergebot added the merging label Nov 12, 2024

pytorchmergebot closed this in 263a5bf Nov 12, 2024

pytorchmergebot removed the merging label Nov 12, 2024

github-actions bot deleted the tree_loop_vec_target branch December 12, 2024 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cpu] Modify inductor opt flag --- ftree-loop-vectorize#136827

[cpu] Modify inductor opt flag --- ftree-loop-vectorize#136827
Valentine233 wants to merge 5 commits intomainfrom
tree_loop_vec_target

Valentine233 commented Sep 27, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Sep 27, 2024 •

edited

Loading

Uh oh!

Valentine233 commented Oct 9, 2024 •

edited

Loading

Uh oh!

Valentine233 commented Oct 10, 2024

Uh oh!

jgong5 Oct 11, 2024

Uh oh!

Valentine233 Oct 11, 2024

Uh oh!

Valentine233 commented Nov 7, 2024

Uh oh!

Valentine233 commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Uh oh!

Valentine233 commented Nov 11, 2024

Uh oh!

Valentine233 commented Nov 12, 2024

Uh oh!

pytorchmergebot commented Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Valentine233 commented Sep 27, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Validation on 3 benchmark suites

FP32

BF16

Uh oh!

pytorch-bot bot commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136827

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Valentine233 commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Valentine233 commented Oct 10, 2024

Uh oh!

jgong5 Oct 11, 2024

Choose a reason for hiding this comment

Uh oh!

Valentine233 Oct 11, 2024

Choose a reason for hiding this comment

Uh oh!

Valentine233 commented Nov 7, 2024

Uh oh!

Valentine233 commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Merge started

Uh oh!

Valentine233 commented Nov 11, 2024

Uh oh!

Valentine233 commented Nov 12, 2024

Uh oh!

pytorchmergebot commented Nov 12, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Valentine233 commented Sep 27, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 27, 2024 •

edited

Loading

Valentine233 commented Oct 9, 2024 •

edited

Loading