Vec256 Test cases#42685
Conversation
|
For now AVX2 and VSX both fails on these tests Additionally VSX fails on Multiplication because of precision Failures above are precision related and tests can be checked within the domain and with low precision. |
💊 CI failures summary and remediationsAs of commit 14e2ef9 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 153 times. |
|
@colesbury @VitalyFedyunin let me know if you need help finding other people to review |
|
Thank you for separating and valuable find about precision, I will make review my highest priority at Monday. |
5c1eff7 to
f0328cd
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
VitalyFedyunin
left a comment
There was a problem hiding this comment.
Overall test structure and testing approach looks good to me, but it will take time to review tests one by one.
…st isn't supported
glaringlee
left a comment
There was a problem hiding this comment.
@quickwritereader @VitalyFedyunin
One thing want to mention here, I remembered that the TYPED_TEST_CASE is going to be deprecated and TYPED_TEST_SUITE is the new one to use, I believe no syntax change. I think we should use TYPED_TEST_SUITE if possible.
I will start to review the tests within this code tomorrow.
|
@glaringlee I wanted to rename it, but gtest in the third_party folder is old |
glaringlee
left a comment
There was a problem hiding this comment.
@quickwritereader
Sry for updating late, the code is a bit long here. I made some comments.
Let's keep TYPED_TEST_CASE for now (upgrading the googletest version for pytorch is not that trivial within facebook)
@VitalyFedyunin
std::cout is used within the test, should we comment all the std::cout lines?
I don't think you need to add those AVX checks in the test. FB internal actually use a different mechanism to build pytorch, not CMake (It is similar to bazel but not the same, You can see some TARGET files, those are used internally). Pytorch rely on preprocessor defined macro to detected AVX (CPU_CAPABILITY_DEFAULT/VSX/AVX2 etc), and those flags are all undefined in fb internal test. If no such flags, all vec functions will fall back into vec256_base.h which uses |
|
I will try to compare against std then with the decreased domain. |
Ah, haha, there might be some misunderstanding here. I think you don't need to decrease your domain. |
|
well, you replicating std behavior. which I chose to tests against local Pytorch complex avx behavior to make tests correct (excluding gcc fp contract mess) |
|
I can fix inf vs big number easily actually but not asin acos. for that I need to decrease precision |
Ah, got u. Agree! |
|
@glaringlee I will try to add ifdef block to fallback when CPU_CAPABILITY_* are not defined. |
…ts when CPU_CAPABILITY_* is missing
facebook-github-bot
left a comment
There was a problem hiding this comment.
@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@quickwritereader |
|
GMT+4 |
|
@quickwritereader This worked in fb internal. All test passed. I am running a github CI test now. #44712 Just have one question here: Now above piece will be hit only when CPU_CAPABILITY_DEFAULT is defined and AVX/AVX2 macros are not defined under Clang or GNU. And |
|
@glaringlee |
Understood! Thanks a lot. Github CI test is still running, so far so good. |
glaringlee
left a comment
There was a problem hiding this comment.
@quickwritereader Github CI finished. All passed. I think we are good now. I'll go ahead to land this.
Thank you so much for working on this huge test!
facebook-github-bot
left a comment
There was a problem hiding this comment.
@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: [Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676) Testing Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx vec_round failed for (even)+0 .5 values**~~ . it was solved by using vec_rint - ~~**complex types are not tested**~~ **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain** - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Fixes: #15676 Pull Request resolved: #42685 Reviewed By: malfet Differential Revision: D23034406 Pulled By: glaringlee fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8
This is to add vec256 test (introduced in #42685) into linux CI system. The whole test will last 50 to 60 seconds. Differential Revision: [D23772923](https://our.internmc.facebook.com/intern/diff/D23772923) [ghstack-poisoned]
This is to add vec256 test (introduced in #42685) into linux CI system. The whole test will last 50 to 60 seconds. Differential Revision: [D23772923](https://our.internmc.facebook.com/intern/diff/D23772923) [ghstack-poisoned]
This is to add vec256 test (introduced in #42685) into linux CI system. The whole test will last 50 to 60 seconds. Differential Revision: [D23772923](https://our.internmc.facebook.com/intern/diff/D23772923) [ghstack-poisoned]
This is to add vec256 test (introduced in #42685) into linux CI system. The whole test will last 50 to 60 seconds. Differential Revision: [D23772923](https://our.internmc.facebook.com/intern/diff/D23772923) [ghstack-poisoned]
Summary: ### Pytorch Vec256 ppc64le support implemented types: - double - float - int16 - int32 - int64 - qint32 - qint8 - quint8 - complex_float - complex_double Notes: All basic vector operations are implemented: There are a few problems: - minimum maximum nan propagation for ppc64le is missing and was not checked - complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones. That's why they were either excluded or tested in smaller domain range - precisions of the implemented float math functions ~~Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT too~~ #### Details ##### Supported math functions + plus sign means vectorized, - minus sign means missing, (implementation notes are added inside braces) (notes). Example: -(both ) means it was also missing on x86 side g( func_name) means vectorization is using func_name sleef - redirected to the Sleef unsupported function_name | float | double | complex float | complex double |-- | -- | -- | -- | --| acos | sleef | sleef | f(asin) | f(asin) asin | sleef | sleef | +(pytorch impl) | +(pytorch impl) atan | sleef | sleef | f(log) | f(log) atan2 | sleef | sleef | unsupported | unsupported cos | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both) cosh | f(exp) | -(both) | -(both) | erf | sleef | sleef | unsupported | unsupported erfc | sleef | sleef | unsupported | unsupported erfinv | - (both) | - (both) | unsupported | unsupported exp | + | sleef | - (x86:f()) | - (x86:f()) expm1 | f(exp) | sleef | unsupported | unsupported lgamma | sleef | sleef | | log | + | sleef | -(both) | -(both) log10 | f(log) | sleef | f(log) | f(log) log1p | f(log) | sleef | unsupported | unsupported log2 | f(log) | sleef | f(log) | f(log) pow | + f(exp) | sleef | -(both) | -(both) sin | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both) sinh | f(exp) | sleef | -(both) | -(both) tan | sleef | sleef | -(both) | -(both) tanh | f(exp) | sleef | -(both) | -(both) hypot | sleef | sleef | -(both) | -(both) nextafter | sleef | sleef | -(both) | -(both) fmod | sleef | sleef | -(both) | -(both) [Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685) Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx vec_round failed for (even)+0 .5 values**~~ . it was solved by using vec_rint - ~~**complex types are not tested**~~ **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain** - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Pull Request resolved: #41541 Reviewed By: zhangguanheng66 Differential Revision: D23922049 Pulled By: VitalyFedyunin fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098
Summary: [Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676) Testing Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx vec_round failed for (even)+0 .5 values**~~ . it was solved by using vec_rint - ~~**complex types are not tested**~~ **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain** - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Fixes: pytorch#15676 Pull Request resolved: pytorch#42685 Reviewed By: malfet Differential Revision: D23034406 Pulled By: glaringlee fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8
Tests for Vec256 classes #15676
Testing
Current list:
Missing:
Notes on tests and testing framework
For example, std::round and vector version of round differs. so it was tested against the local versionfor double type on Vsx vec_round failed for (even)+0 .5 values. it was solved by using vec_rintcomplex types are not testedAfter enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domainquantizations are not testedAdded tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functionsFor now-DBUILD_MOBILE_TEST=ONwill be used for Vec256Test tooVec256 Test cases will be built for each CPU_CAPABILITY
Fixes: #15676