[PyTorch] Add efficient isnan for NEON half#139083

Closed

swolchok wants to merge 8 commits intogh/swolchok/679/basefrom

gh/swolchok/679/head

Contributor

swolchok commented Oct 28, 2024 •

edited

Loading

Stack from ghstack (oldest at bottom):

Same as the efficient one for float when f16 hardware support is available.

Testing: Added exhaustive isnan test coverage

Differential Revision: D65003321

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10


          [PyTorch] Add efficient isnan for NEON half

3ffad51

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

[ghstack-poisoned]

pytorch-bot bot added the module: cpu label

pytorch-bot bot commented Oct 28, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139083

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2d1d999 with merge base 86602a6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Contributor

facebook-github-bot commented Oct 28, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321

facebook-github-bot added the fb-exported label

This was referenced Oct 28, 2024

[PyTorch] Move NEON VecConvert specialization from vec256_convert to vec128_convert #137661

Closed

[PyTorch] Specialize Vectorized<Half> for NEON even if FP16 arithmetic isn't available #137911

Closed

[PyTorch] Migrate fp16 gemv fast path kernel from intrinsics to vec::Vectorized #137912

Closed

[PyTorch] Use Half, not float16_t, in fp16 gemv fast path signatures #137913

Closed

[PyTorch] Move FP16 dot and GEMV kernels to new file in ATen/native/cpu/ #137914

Closed

[PyTorch] Clean up Registers/ElementsPerIteration constants #137915

Closed

[PyTorch] Convert reduced precision gemv vectorized tail loop to use whole vector register instead of half #137916

Closed

[PyTorch] Build ReducedPrecisionFloatGemvFastPathKernel & entry points for non-ARM architectures too #137917

Closed

[PyTorch] Hook up fp16_gemv_trans to x86 fp16 GEMM #137918

Closed

[PyTorch] Hook up fp16_gemv_trans to gemv fast path for non-aarch64 architectures #138005

Closed

[PyTorch] Support non-zero beta in fp16_gemv_trans #138275

Closed

Move bf16_gemv_trans to ReducedPrecisionFloatGemvFastPathKernel #139081

Closed

[PyTorch] Add efficient isnan for NEON float #139082

Closed

Extract value_type-generic NEON Vectorized<Half> functions to CRTP base class #139084

Closed

Add Vectorized<c10::BFloat16> specialization for ARM #139090

Closed


          Update on "[PyTorch] Add efficient isnan for NEON half"

c7cdba0

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Oct 28, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321


          Update on "[PyTorch] Add efficient isnan for NEON half"

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Oct 29, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321

swolchok mentioned this pull request

[PyTorch] Migrate bf16 gemv fast path kernel from intrinsics to vec::Vectorized #139159

Closed

malfet added the ciflow/linux-aarch64 label


          Update on "[PyTorch] Add efficient isnan for NEON half"

505e104

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]

Contributor

facebook-github-bot commented Oct 29, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321

swolchok mentioned this pull request

Build bf16 gemv fast path & entry points for non-ARM architectures too #139208

Closed


          Update on "[PyTorch] Add efficient isnan for NEON half"

240a2bf

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]

Contributor

facebook-github-bot commented Oct 29, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321

swolchok mentioned this pull request

Hook up bf16_gemv_trans to x86 bf16 GEMM #139220

Closed

swolchok added the release notes: performance_as_product label

swolchok requested review from malfet and metascroy

October 30, 2024 15:23

Contributor Author

swolchok commented Oct 31, 2024

I have fixes for this but don't want to re-kick CI on ready-to-go diffs below it in the stack...

malfet approved these changes

View reviewed changes

pytorch-bot bot added the ciflow/trunk label


          Update on "[PyTorch] Add efficient isnan for NEON half"

2d6f058

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]

Contributor

facebook-github-bot commented Oct 31, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321


          Update on "[PyTorch] Add efficient isnan for NEON half"

5a8c33e

Same as the efficient one for float when f16 hardware support is available.

Testing: Added exhaustive isnan test coverage

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]

Contributor

facebook-github-bot commented Oct 31, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321


          Update on "[PyTorch] Add efficient isnan for NEON half"

2d1d999

Same as the efficient one for float when f16 hardware support is available.

Testing: Added exhaustive isnan test coverage

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]

Contributor

facebook-github-bot commented Nov 1, 2024

This pull request was exported from Phabricator. Differential Revision: D65003321

Contributor

malfet commented Nov 1, 2024

@pytorchbot merge -f "Lint + builds + relevant tests are green"

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Nov 1, 2024

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

d79c514

pytorchmergebot removed the merging label

pytorchmergebot pushed a commit that referenced this pull request


          [PyTorch] Hook up fp16_gemv_trans to x86 fp16 GEMM (#137918)

fad5d89

This is the first big milestone we've been building towards!
(Following rev also hooks this up to actual gemv.)
Testing: To check perf, I ran python torchchat.py generate stories110M
--dtype fp16 --device cpu on an x86 machine without AVX512FP16. Observed roughly 5x tokens/sec increase.
Differential Revision: [D64280688](https://our.internmc.facebook.com/intern/diff/D64280688/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D64280688/)!
Pull Request resolved: #137918
Approved by: https://github.com/malfet
ghstack dependencies: #139082, #139083

pytorchmergebot pushed a commit that referenced this pull request


          [PyTorch] Hook up fp16_gemv_trans to gemv fast path for non-aarch64 a…

195b1b9

…rchitectures (#138005)

Following up on previous rev to use fp16_gemv_trans in gemv, not just gemm-used-for-gemv.

Differential Revision: [D64351092](https://our.internmc.facebook.com/intern/diff/D64351092/)
Pull Request resolved: #138005
Approved by: https://github.com/malfet
ghstack dependencies: #139082, #139083, #137918

pytorchmergebot pushed a commit that referenced this pull request


          [PyTorch] Support non-zero beta in fp16_gemv_trans (#138275)

3e0f4d1

No real reason to have the zero-beta restriction, so let's lift it.

Testing: intentionally broke new paths locally to verify test coverage existed

Differential Revision: [D64407752](https://our.internmc.facebook.com/intern/diff/D64407752/)

Pull Request resolved: #138275
Approved by: https://github.com/malfet
ghstack dependencies: #139082, #139083, #137918, #138005

rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request


          [PyTorch] Add efficient isnan for NEON half (pytorch#139083)

260c6ef

Same as the efficient one for float when f16 hardware support is available.

Testing: Added exhaustive isnan test coverage

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

Pull Request resolved: pytorch#139083
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082

rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request


          [PyTorch] Hook up fp16_gemv_trans to x86 fp16 GEMM (pytorch#137918)

1ac3c53

This is the first big milestone we've been building towards!
(Following rev also hooks this up to actual gemv.)
Testing: To check perf, I ran python torchchat.py generate stories110M
--dtype fp16 --device cpu on an x86 machine without AVX512FP16. Observed roughly 5x tokens/sec increase.
Differential Revision: [D64280688](https://our.internmc.facebook.com/intern/diff/D64280688/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D64280688/)!
Pull Request resolved: pytorch#137918
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082, pytorch#139083

rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request


          [PyTorch] Hook up fp16_gemv_trans to gemv fast path for non-aarch64 a…

ca39818

…rchitectures (pytorch#138005)

Following up on previous rev to use fp16_gemv_trans in gemv, not just gemm-used-for-gemv.

Differential Revision: [D64351092](https://our.internmc.facebook.com/intern/diff/D64351092/)
Pull Request resolved: pytorch#138005
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082, pytorch#139083, pytorch#137918

rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request


          [PyTorch] Support non-zero beta in fp16_gemv_trans (pytorch#138275)

ade7ab0

No real reason to have the zero-beta restriction, so let's lift it.

Testing: intentionally broke new paths locally to verify test coverage existed

Differential Revision: [D64407752](https://our.internmc.facebook.com/intern/diff/D64407752/)

Pull Request resolved: pytorch#138275
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082, pytorch#139083, pytorch#137918, pytorch#138005

github-actions bot deleted the gh/swolchok/679/head branch

December 2, 2024 02:13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/linux-aarch64 ciflow/trunk fb-exported Merged module: cpu