Vsx initial support issue27678 #41541
Vsx initial support issue27678 #41541quickwritereader wants to merge 3 commits intopytorch:masterfrom
Conversation
💊 CI failures summary and remediationsAs of commit 97c8e17 (more details on the Dr. CI page):
Extra GitHub checks: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 162 times. |
4925d8f to
ccab406
Compare
19959fe to
84f7d05
Compare
|
@smessmer in the future, please don't mark an OSS PR as triaged without also assigning a reviewer for it |
|
Not to sure who should be the shepherd for this. @VitalyFedyunin do you have any ideas? |
|
New modifications on Vec256Tests:
|
|
For now AVX2 and VSX both fails on these tests Additionally VSX fails on Multiplication because of precision |
|
Failures above are precision related and tests can be checked within the domain and with low precision. |
|
@VitalyFedyunin if possible, could we discuss my Pr this week |
|
Do we know anyone at IBM who could help review this? |
|
@xuhdev / @zasdfgbnm is this the sort of thing either of you would be interested in? :) |
|
I know very little about CPU vectorization :) |
hello, I have little PowerPC experience, but given time I can potentially review and land this proposal. Several notes so far:
|
|
|
#41541 (comment) |
|
Hello @Flamefire, we are looking to rewrite (and remove massive duplication) Vec code (eventually to support AVX512 and various vec sizes and architectures). So far we are happy with the code as is. |
|
@quickwritereader please scroll up for CLA information, without it I cannot import your code to run internal tests. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This fixes multiple bugs introduced by the VSX optimized code in #41541 - min/max/clamp now consistently return nan when any value is NaN as on other architectures - The non-complex angle functions return PI for negative values now - The complex angle functions have been corrected and optimized - The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64 Fixes #59248 Fixes #57537 Pull Request resolved: #59382 Reviewed By: jbschlosser Differential Revision: D28944626 Pulled By: ezyang fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
Summary: This fixes multiple bugs introduced by the VSX optimized code in pytorch#41541 - min/max/clamp now consistently return nan when any value is NaN as on other architectures - The non-complex angle functions return PI for negative values now - The complex angle functions have been corrected and optimized - The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64 Fixes pytorch#59248 Fixes pytorch#57537 Pull Request resolved: pytorch#59382 Reviewed By: jbschlosser Differential Revision: D28944626 Pulled By: ezyang fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
This fixes the remaining bug introduced by the VSX optimized code in #41541 Followup to #59382 ### Description The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail. ### Testing Build and ran tests on PPC Pull Request resolved: #82646 Approved by: https://github.com/ezyang
Summary: This fixes the remaining bug introduced by the VSX optimized code in #41541 Followup to #59382 ### Description The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail. ### Testing Build and ran tests on PPC Pull Request resolved: #82646 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/39ffad392c49aafcfeba05e2704bb1b666247471 Reviewed By: kit1980 Differential Revision: D38395263 fbshipit-source-id: c4c56af2d8e3b528b6418a4a32c63de77037e5cf
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541 Also remove some whitespace wrongly added in the above PRs. This fixes wrong results for e.g. `sin(1e20)`. Fixes pytorch#85978
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541 Also remove some whitespace wrongly added in the above PRs. This fixes wrong results for e.g. `sin(1e20)`. Fixes pytorch#85978
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in #59382 & #82646 after #41541 This fixes wrong results for e.g. `sin(1e20)`. Fixes #85978 To fix #85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done. Pull Request resolved: #86453 Approved by: https://github.com/malfet
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541 This fixes wrong results for e.g. `sin(1e20)`. Fixes pytorch#85978 To fix pytorch#85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done. Pull Request resolved: pytorch#86453 Approved by: https://github.com/malfet
Summary: ### Pytorch Vec256 ppc64le support implemented types: - double - float - int16 - int32 - int64 - qint32 - qint8 - quint8 - complex_float - complex_double Notes: All basic vector operations are implemented: There are a few problems: - minimum maximum nan propagation for ppc64le is missing and was not checked - complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones. That's why they were either excluded or tested in smaller domain range - precisions of the implemented float math functions ~~Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT too~~ #### Details ##### Supported math functions + plus sign means vectorized, - minus sign means missing, (implementation notes are added inside braces) (notes). Example: -(both ) means it was also missing on x86 side g( func_name) means vectorization is using func_name sleef - redirected to the Sleef unsupported function_name | float | double | complex float | complex double |-- | -- | -- | -- | --| acos | sleef | sleef | f(asin) | f(asin) asin | sleef | sleef | +(pytorch impl) | +(pytorch impl) atan | sleef | sleef | f(log) | f(log) atan2 | sleef | sleef | unsupported | unsupported cos | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both) cosh | f(exp) | -(both) | -(both) | erf | sleef | sleef | unsupported | unsupported erfc | sleef | sleef | unsupported | unsupported erfinv | - (both) | - (both) | unsupported | unsupported exp | + | sleef | - (x86:f()) | - (x86:f()) expm1 | f(exp) | sleef | unsupported | unsupported lgamma | sleef | sleef | | log | + | sleef | -(both) | -(both) log10 | f(log) | sleef | f(log) | f(log) log1p | f(log) | sleef | unsupported | unsupported log2 | f(log) | sleef | f(log) | f(log) pow | + f(exp) | sleef | -(both) | -(both) sin | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both) sinh | f(exp) | sleef | -(both) | -(both) tan | sleef | sleef | -(both) | -(both) tanh | f(exp) | sleef | -(both) | -(both) hypot | sleef | sleef | -(both) | -(both) nextafter | sleef | sleef | -(both) | -(both) fmod | sleef | sleef | -(both) | -(both) [Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685) Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx vec_round failed for (even)+0 .5 values**~~ . it was solved by using vec_rint - ~~**complex types are not tested**~~ **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain** - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Pull Request resolved: pytorch#41541 Reviewed By: zhangguanheng66 Differential Revision: D23922049 Pulled By: VitalyFedyunin fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098
Summary: This fixes multiple bugs introduced by the VSX optimized code in pytorch#41541 - min/max/clamp now consistently return nan when any value is NaN as on other architectures - The non-complex angle functions return PI for negative values now - The complex angle functions have been corrected and optimized - The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64 Fixes pytorch#59248 Fixes pytorch#57537 Pull Request resolved: pytorch#59382 Reviewed By: jbschlosser Differential Revision: D28944626 Pulled By: ezyang fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
This fixes the remaining bug introduced by the VSX optimized code in pytorch#41541 Followup to pytorch#59382 ### Description The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail. ### Testing Build and ran tests on PPC Pull Request resolved: pytorch#82646 Approved by: https://github.com/ezyang
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541 This fixes wrong results for e.g. `sin(1e20)`. Fixes pytorch#85978 To fix pytorch#85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done. Pull Request resolved: pytorch#86453 Approved by: https://github.com/malfet
Pytorch Vec256 ppc64le support
implemented types:
Notes:
All basic vector operations are implemented:
There are a few problems:
Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT tooDetails
Supported math functions
(notes). Example: -(both ) means it was also missing on x86 side
g( func_name) means vectorization is using func_name
sleef - redirected to the Sleef
unsupported
Vec256 Test cases Pr #42685
Current list:
Missing:
Notes on tests and testing framework
For example, std::round and vector version of round differs. so it was tested against the local versionfor double type on Vsx vec_round failed for (even)+0 .5 values. it was solved by using vec_rintcomplex types are not testedAfter enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domainquantizations are not testedAdded tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functionsFor now-DBUILD_MOBILE_TEST=ONwill be used for Vec256Test tooVec256 Test cases will be built for each CPU_CAPABILITY