Enable torch build with SLEEF on ARM by default#133339
Enable torch build with SLEEF on ARM by default#133339aditew01 wants to merge 6 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133339
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (5 Unrelated Failures)As of commit 4fe7a60 with merge base 701ba52 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot drci |
|
@pytorchbot label ciflow/linux-aarch64 module: arm |
|
Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help. |
|
cc: @malfet |
|
@pytorchbot label "ciflow/linux-aarch64" |
|
@pytorchbot label "module:arm" |
|
Didn't find following labels among repository labels: module:arm |
7712030 to
676c737
Compare
|
Please seek CI approval before scheduling CIFlow labels |
|
@pytorchbot rebase |
|
You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra. |
|
@pytorchbot label "module: arm" |
|
Didn't find following labels among repository labels: module:arm |
|
@pytorchbot label "module: arm" |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: Check Labels / Check labels Details for Dev Infra teamRaised by workflow job |
|
Line 308 in e8ad508 Does this boolean do anything after your proposed change? Should it be removed? |
I think this is a good suggestion given the flags will now be activated by default. Thanks for pointing it out @robert-hardwick |
o Enable codegen kernels compilation with SLEEF on ARM platform
This reverts commit 7ce6726.
|
We have also tested "SVE + Inductor flow" from @aditew01 PR: (#134672) without sleef and with sleef and observed consistent performance improvements with sleef build. This change will further enhance default performance on ARM CPU's. The below results are from torchbench on a 32 core Graviton 3 EC2 Instance: changes LGTM! |
|
@pytorchbot merge -f 'All related PR tests are green' |
|
You are not authorized to force merges to this repository. Please use the regular |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@malfet a naive question, do we re-trigger the mergebot? |
|
@pytorchbot merge |
Yes, you should be able to, I was waiting for Android fix before issuing another merge command. |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge -f "No need to wait for torchbench runs" |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Hello! SLEEF maintainer speaking here. I have a few questions regarding this PR.
|

Scope: Enable PyTorch build with SLEEF on Arm by default. Enable codegen kernels compilation with SLEEF on ARM platform.
Enabling the build with SLEEF by default and setting
AT_BUILD_ARM_VEC256_WITH_SLEEFas the default for Arm improves performance for some models. I have benchmarked several networks onNeoverse-V1usingtorch.compilewith theinductorbackend.On models like
hf_Bert_Large,hf_GPT_fast, we're seeing a ~1.2x speedup (with 16 threads).The below results are run with
Batch_Size=1andCores=8, 16cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @mcarilli @ptrblck @leslie-fang-intel @malfet @milpuz01 @EikanWang @voznesenskym @penguinwu @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn