Enable torch build with SLEEF on ARM by default by aditew01 · Pull Request #133339 · pytorch/pytorch

aditew01 · 2024-08-13T18:22:13Z

Scope: Enable PyTorch build with SLEEF on Arm by default. Enable codegen kernels compilation with SLEEF on ARM platform.

Enabling the build with SLEEF by default and setting AT_BUILD_ARM_VEC256_WITH_SLEEF as the default for Arm improves performance for some models. I have benchmarked several networks on Neoverse-V1 using torch.compile with the inductor backend.
On models like hf_Bert_Large , hf_GPT_fast, we're seeing a ~1.2x speedup (with 16 threads).

The below results are run with Batch_Size=1 and Cores=8, 16

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @mcarilli @ptrblck @leslie-fang-intel @malfet @milpuz01 @EikanWang @voznesenskym @penguinwu @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn

pytorch-bot · 2024-08-13T18:22:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133339

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit 4fe7a60 with merge base 701ba52 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh) (detected as infra flaky with no runner)
pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, lf.linux.2xlarge) (gh) (disabled by #128551 but the issue was closed recently and a rebase is needed to make it pass)
test_dataloader.py::TestDataLoader::test_segfault
pull / linux-focal-py3.12-clang10 / test (dynamo, 2, 3, lf.linux.2xlarge) (gh) (disabled by #128551 but the issue was closed recently and a rebase is needed to make it pass)
test_dataloader.py::TestDataLoader::test_segfault
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 2, 3, linux.2xlarge) (gh) (disabled by #128551 but the issue was closed recently and a rebase is needed to make it pass)
test_dataloader.py::TestDataLoader::test_segfault
pull / linux-focal-py3.9-clang10 / test (dynamo, 3, 3, lf.linux.2xlarge) (gh) (disabled by #128551 but the issue was closed recently and a rebase is needed to make it pass)
test_dataloader.py::TestDataLoader::test_segfault

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-08-13T18:22:17Z

The committers listed above are authorized under a signed CLA.

✅ login: aditew01 (bce6630, c8c0c8b, 7a5e0a7, 4fe7a60, 1946800)
✅ login: malfet / name: Nikita Shulga (3dce9b9)

aditew01 · 2024-08-14T10:00:13Z

@pytorchbot drci

aditew01 · 2024-08-14T11:36:06Z

@pytorchbot label ciflow/linux-aarch64 module: arm

pytorch-bot · 2024-08-14T11:36:13Z

Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help.

aditew01 · 2024-08-14T11:40:55Z

cc: @malfet

cfRod · 2024-08-15T17:20:56Z

@pytorchbot label "ciflow/linux-aarch64"

cfRod · 2024-08-15T17:21:24Z

@pytorchbot label "module:arm"

pytorch-bot · 2024-08-15T17:21:31Z

Didn't find following labels among repository labels: module:arm

pytorch-bot · 2024-08-16T13:22:50Z

Please seek CI approval before scheduling CIFlow labels

aditew01 · 2024-08-21T09:45:27Z

@pytorchbot rebase

pytorch-bot · 2024-08-21T09:45:31Z

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

aditew01 · 2024-08-21T10:55:33Z

@pytorchbot label "module: arm"

pytorch-bot · 2024-08-21T10:55:39Z

Didn't find following labels among repository labels: module:arm

aditew01 · 2024-08-21T10:59:52Z

@pytorchbot label "module: arm"

pytorchmergebot · 2024-09-13T22:01:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-13T22:06:53Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Check Labels / Check labels

Details for Dev Infra team

Raised by workflow job

robert-hardwick · 2024-09-18T09:44:11Z

pytorch/CMakeLists.txt

Line 308 in e8ad508

option(USE_SLEEF_FOR_ARM_VEC256 "Use sleef for arm" OFF)

Does this boolean do anything after your proposed change? Should it be removed?

aditew01 · 2024-09-18T13:21:47Z

pytorch/CMakeLists.txt

Line 308 in e8ad508

option(USE_SLEEF_FOR_ARM_VEC256 "Use sleef for arm" OFF)

Does this boolean do anything after your proposed change? Should it be removed?

I think this is a good suggestion given the flags will now be activated by default. Thanks for pointing it out @robert-hardwick

o Enable codegen kernels compilation with SLEEF on ARM platform

This reverts commit 7ce6726.

maajidkhann · 2024-09-19T06:44:04Z

We have also tested "SVE + Inductor flow" from @aditew01 PR: (#134672) without sleef and with sleef and observed consistent performance improvements with sleef build.

This change will further enhance default performance on ARM CPU's.

The below results are from torchbench on a 32 core Graviton 3 EC2 Instance:

changes LGTM!

abhishek-iitmadras · 2024-09-20T06:46:58Z

@pytorchbot merge -f 'All related PR tests are green'

pytorch-bot · 2024-09-20T06:47:01Z

You are not authorized to force merges to this repository. Please use the regular @pytorchmergebot merge command instead

pytorchmergebot · 2024-09-20T06:49:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-20T12:47:55Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

aditew01 · 2024-09-20T14:59:58Z

@malfet a naive question, do we re-trigger the mergebot?

malfet · 2024-09-20T15:21:49Z

@pytorchbot merge

malfet · 2024-09-20T15:22:49Z

@malfet a naive question, do we re-trigger the mergebot?

Yes, you should be able to, I was waiting for Android fix before issuing another merge command.

pytorchmergebot · 2024-09-20T15:23:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2024-09-20T16:00:26Z

@pytorchbot merge -f "No need to wait for torchbench runs"

pytorchmergebot · 2024-09-20T16:00:46Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-09-20T16:02:18Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

blapie · 2024-09-27T07:47:06Z

Hello! SLEEF maintainer speaking here. I have a few questions regarding this PR.

Why does it have to be disabled on Android? Is there a plan to enable it?
What is compared in the benchmarks? Is it comparing against calls to standard scalar implementations (e.g. libc)? When SLEEF is enabled, is it calling to Neon or SVE routines?
I suppose only the high accuracy routines are used, since they have the typical GNU ABI names?
Is there room for relaxing accuracy and use the more competitive 3.5ULP (standard accuracy in glibc libmvec)?

pytorch-bot Bot added the module: inductor label Aug 13, 2024

pytorchbot added the open source label Aug 13, 2024

janeyx99 requested a review from EikanWang August 14, 2024 00:35

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 14, 2024

malfet approved these changes Aug 15, 2024

View reviewed changes

pytorch-bot Bot added the ciflow/linux-aarch64 linux aarch64 CI workflow label Aug 15, 2024

aditew01 requested review from IvanYashchuk, lezcano and nikitaved as code owners August 16, 2024 13:13

pytorch-bot Bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: sparse release notes category labels Aug 16, 2024

aditew01 force-pushed the aditew01/arm_sleef branch from 7712030 to 676c737 Compare August 16, 2024 13:22

pytorch-bot Bot added the ciflow/inductor label Aug 16, 2024

pytorch-bot Bot removed the ciflow/inductor label Aug 16, 2024

pytorch-bot Bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label Aug 21, 2024

malfet removed request for IvanYashchuk and nikitaved September 13, 2024 21:59

kimishpatel approved these changes Sep 15, 2024

View reviewed changes

aditew01 and others added 5 commits September 18, 2024 17:44

o Enable torch build with SLEEF on ARM by default

7a5e0a7

o Enable codegen kernels compilation with SLEEF on ARM platform

o Refactor appending isa_list across different ARM platforms

1946800

Undo recent change

3dce9b9

Revert "o Enable torch build with SLEEF on ARM by default"

bce6630

This reverts commit 7ce6726.

o Refactor build to skip SLEEF on Android platform

c8c0c8b

* fix linter error

4fe7a60

malfet approved these changes Sep 20, 2024

View reviewed changes

malfet mentioned this pull request Sep 20, 2024

Fix Vectorized<double>::next_after SVE compilation #136388

Closed

zou3519 mentioned this pull request Sep 25, 2024

Revert a bunch of stuff #136668

Closed

Conversation

aditew01 commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133339

✅ You can merge normally! (5 Unrelated Failures)

Uh oh!

linux-foundation-easycla Bot commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditew01 commented Aug 14, 2024

Uh oh!

aditew01 commented Aug 14, 2024

Uh oh!

pytorch-bot Bot commented Aug 14, 2024

Uh oh!

aditew01 commented Aug 14, 2024

Uh oh!

cfRod commented Aug 15, 2024

Uh oh!

cfRod commented Aug 15, 2024

Uh oh!

pytorch-bot Bot commented Aug 15, 2024

Uh oh!

pytorch-bot Bot commented Aug 16, 2024

Uh oh!

aditew01 commented Aug 21, 2024

Uh oh!

pytorch-bot Bot commented Aug 21, 2024

Uh oh!

aditew01 commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Aug 21, 2024

Uh oh!

aditew01 commented Aug 21, 2024

Uh oh!

pytorchmergebot commented Sep 13, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 13, 2024

Merge failed

Uh oh!

robert-hardwick commented Sep 18, 2024

Uh oh!

aditew01 commented Sep 18, 2024

Uh oh!

maajidkhann commented Sep 19, 2024

Uh oh!

abhishek-iitmadras commented Sep 20, 2024

Uh oh!

pytorch-bot Bot commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 20, 2024

Uh oh!

aditew01 commented Sep 20, 2024

Uh oh!

malfet commented Sep 20, 2024

Uh oh!

malfet commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge started

Uh oh!

malfet commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge started

Uh oh!

blapie commented Sep 27, 2024

Uh oh!

aditew01 commented Aug 13, 2024 •

edited

Loading

pytorch-bot Bot commented Aug 13, 2024 •

edited

Loading

linux-foundation-easycla Bot commented Aug 13, 2024 •

edited

Loading

aditew01 commented Aug 21, 2024 •

edited

Loading

malfet commented Sep 20, 2024 •

edited

Loading