[PyTorch] Check defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256) instead of defined(CPU_CAPABILITY_NEON)#137722
[PyTorch] Check defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256) instead of defined(CPU_CAPABILITY_NEON)#137722swolchok wants to merge 5 commits intogh/swolchok/652/basefrom
defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256) instead of defined(CPU_CAPABILITY_NEON)#137722Conversation
The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704 Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137722
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1f40b24 with merge base 0786b37 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D64197046 |
|
looks like this needs to be identical to the gating surrounding vec256_float_neon |
specifically, 1) we just use |
…PABILITY_NEON" The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704 Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64197046 |
defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256) instead of defined(CPU_CAPABILITY_NEON)
|
hooray for green tests |
|
Hi @swolchok |
it is CPU_CAPABILITY_SVE256 because that is the gating surrounding vec256_float_neon.h: https://www.internalfb.com/code/fbsource/[0660504a24502363bb08b1b37504bc3ef6d878eb]/fbcode/caffe2/aten/src/ATen/cpu/vec/vec256/vec256.h?lines=10-15 |
…BILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`" The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704 Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64197046 |
…BILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`" The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704 Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64197046 |
malfet
left a comment
There was a problem hiding this comment.
This strips the ability of generating non-NEON accelerated code in torch.comple, which feels like a regression...
I.e. please undo changes to cpp_prefix.h/cpu_vec_isa.py
…h64__) && !defined(CPU_CAPABILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`" The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704 Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64197046 |
…56)` instead of `defined(CPU_CAPABILITY_NEON)` Pull Request resolved: #137722 The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704 ghstack-source-id: 248161061 @exported-using-ghexport Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)
malfet
left a comment
There was a problem hiding this comment.
Looks good to me, thank you for the update
…56)` instead of `defined(CPU_CAPABILITY_NEON)` Pull Request resolved: pytorch/pytorch#137722 The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704 ghstack-source-id: 247383134 @exported-using-ghexport Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)
…8014) This will break once we support 128-bit vectors, and there's no reason to do it. Differential Revision: [D64421982](https://our.internmc.facebook.com/intern/diff/D64421982/) Pull Request resolved: #138014 Approved by: https://github.com/malfet, https://github.com/Skylion007 ghstack dependencies: #137722
|
This pull request was exported from Phabricator. Differential Revision: D64197046 |
The ifdef as written just checks if the macOS 15.0-capable SDK is being used. You also need a runtime gate to make sure macOS 15 is in use. Differential Revision: [D64429453](https://our.internmc.facebook.com/intern/diff/D64429453/) Pull Request resolved: #138022 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: #137722, #138014
Stack from ghstack (oldest at bottom):
defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256)instead ofdefined(CPU_CAPABILITY_NEON)#137722The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704
Differential Revision: D64197046
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang