[cpu] Modify inductor opt flag --- ftree-loop-vectorize#136827
[cpu] Modify inductor opt flag --- ftree-loop-vectorize#136827Valentine233 wants to merge 5 commits intomainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136827
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 7f539db with merge base 565a794 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
The two outliers ( |
|
The two outliers are expected to be fixed by #136422. Would land this PR after that. |
torch/_inductor/config.py
Outdated
| # Use ftree-loop-vectorize when compiling | ||
| enable_tree_loop_vec_opt_flag = ( | ||
| os.environ.get("TORCHINDUCTOR_CPP_ENABLE_TREE_LOOP_VEC_OPT_FLAG", "0") == "1" | ||
| ) |
There was a problem hiding this comment.
When is it recommended to turn on this flag? Please add a meaningful note here. Or, would it work if we always disable this compiler flag without an option to turn it on?
There was a problem hiding this comment.
This is kept for the need of potential vectorization by this compiler flag. But as most of the vectorizations are supported now, I suppose that it is fine to remove the option.
4e9434c to
599ddb0
Compare
599ddb0 to
fa384c1
Compare
|
@jgong5 Please help review, the model regressions and CI failures are all resolved. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
a98efcc to
7f539db
Compare
Thanks, Jason. I have modified and please help review again. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Reopen pytorch#121782, as more optimizations have landed. Fixes pytorch#115261, pytorch#113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32  Outlier models (speedup<0.8, single socket): None. #### BF16  Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: pytorch#136827 Approved by: https://github.com/jansel, https://github.com/jgong5
…rch#136827)" This reverts commit cf0bb6c. Reverted pytorch#136827 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. See D65605094 for more details ([comment](pytorch#136827 (comment)))
Reopen pytorch#121782, as more optimizations have landed. Fixes pytorch#115261, pytorch#113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32  Outlier models (speedup<0.8, single socket): None. #### BF16  Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: pytorch#136827 Approved by: https://github.com/jansel, https://github.com/jgong5
Reopen #121782, as more optimizations have landed.
Fixes #115261, #113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.
Validation on 3 benchmark suites
FP32
Outlier models (speedup<0.8, single socket): None.
BF16
Outlier models (speedup<0.8, single socket multi threads):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov