[PyTorch] Use Half, not float16_t, in fp16 gemv fast path signatures#137913
[PyTorch] Use Half, not float16_t, in fp16 gemv fast path signatures#137913swolchok wants to merge 10 commits intogh/swolchok/661/basefrom
Conversation
float16_t is ARM-specific. Half is not. Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137913
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ca2264e with merge base b9618c9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D64218427 |
…signatures" float16_t is ARM-specific. Half is not. Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64218427 |
…signatures" float16_t is ARM-specific. Half is not. Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64218427 |
malfet
left a comment
There was a problem hiding this comment.
Is this to make it accessible on x86? Otherwise float16_t feels like a reasonable type, isn't it?
…signatures" float16_t is ARM-specific. Half is not. Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64218427 |
|
This pull request was exported from Phabricator. Differential Revision: D64218427 |
…signatures" float16_t is ARM-specific. Half is not. Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64218427 |
…signatures" float16_t is ARM-specific. Half is not. Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64218427 |
…pu/ (#137914) This is in preparation for supporting x86 as well; we need to be in this directory so that we can get rebuilt with different CPU_CAPABILITY settings (AVX2/AVX-512). Also incidentally starts fulfilling request from @malfet to split the ARM64 fast path stuff into its own file. BFloat16 will be in a later diff. Differential Revision: [D64265755](https://our.internmc.facebook.com/intern/diff/D64265755/) Pull Request resolved: #137914 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: #137661, #137911, #137912, #137913
In preparation for other vector instruction sets. (NEON and AVX512 have 32 registers, but AVX and AVX2 have only 16.) Differential Revision: [D64265759](https://our.internmc.facebook.com/intern/diff/D64265759/) Pull Request resolved: #137915 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: #137661, #137911, #137912, #137913, #137914
…whole vector register instead of half (#137916) The fixup loop doesn't really need to vectorize the last 7 elements, and not doing so will make migrating to x86 simpler. Differential Revision: [D64280689](https://our.internmc.facebook.com/intern/diff/D64280689/) Pull Request resolved: #137916 Approved by: https://github.com/malfet ghstack dependencies: #137661, #137911, #137912, #137913, #137914, #137915
…s for non-ARM architectures too (#137917) Remove reasons to gate it on ARM. Differential Revision: [D64280687](https://our.internmc.facebook.com/intern/diff/D64280687/) Pull Request resolved: #137917 Approved by: https://github.com/malfet ghstack dependencies: #137661, #137911, #137912, #137913, #137914, #137915, #137916
…ytorch#137913) float16_t is ARM-specific. Half is not. Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/) Pull Request resolved: pytorch#137913 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912
…pu/ (pytorch#137914) This is in preparation for supporting x86 as well; we need to be in this directory so that we can get rebuilt with different CPU_CAPABILITY settings (AVX2/AVX-512). Also incidentally starts fulfilling request from @malfet to split the ARM64 fast path stuff into its own file. BFloat16 will be in a later diff. Differential Revision: [D64265755](https://our.internmc.facebook.com/intern/diff/D64265755/) Pull Request resolved: pytorch#137914 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913
…137915) In preparation for other vector instruction sets. (NEON and AVX512 have 32 registers, but AVX and AVX2 have only 16.) Differential Revision: [D64265759](https://our.internmc.facebook.com/intern/diff/D64265759/) Pull Request resolved: pytorch#137915 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913, pytorch#137914
…whole vector register instead of half (pytorch#137916) The fixup loop doesn't really need to vectorize the last 7 elements, and not doing so will make migrating to x86 simpler. Differential Revision: [D64280689](https://our.internmc.facebook.com/intern/diff/D64280689/) Pull Request resolved: pytorch#137916 Approved by: https://github.com/malfet ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913, pytorch#137914, pytorch#137915
…s for non-ARM architectures too (pytorch#137917) Remove reasons to gate it on ARM. Differential Revision: [D64280687](https://our.internmc.facebook.com/intern/diff/D64280687/) Pull Request resolved: pytorch#137917 Approved by: https://github.com/malfet ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913, pytorch#137914, pytorch#137915, pytorch#137916
Stack from ghstack (oldest at bottom):
float16_t is ARM-specific. Half is not.
Differential Revision: D64218427
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10