Skip to content

Vsx initial support issue27678 #41541

Closed
quickwritereader wants to merge 3 commits intopytorch:masterfrom
quickwritereader:vsx_initial_support_issue27678
Closed

Vsx initial support issue27678 #41541
quickwritereader wants to merge 3 commits intopytorch:masterfrom
quickwritereader:vsx_initial_support_issue27678

Conversation

@quickwritereader
Copy link
Copy Markdown
Contributor

@quickwritereader quickwritereader commented Jul 16, 2020

Pytorch Vec256 ppc64le support

implemented types:

  • double
  • float
  • int16
  • int32
  • int64
  • qint32
  • qint8
  • quint8
  • complex_float
  • complex_double

Notes:
All basic vector operations are implemented:
There are a few problems:

  • minimum maximum nan propagation for ppc64le is missing and was not checked
  • complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones. That's why they were either excluded or tested in smaller domain range
  • precisions of the implemented float math functions

Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT too

Details

Supported math functions
  • plus sign means vectorized, - minus sign means missing, (implementation notes are added inside braces)
    (notes). Example: -(both ) means it was also missing on x86 side
    g( func_name) means vectorization is using func_name
    sleef - redirected to the Sleef
    unsupported
function_name float double complex float complex double
acos sleef sleef f(asin) f(asin)
asin sleef sleef +(pytorch impl) +(pytorch impl)
atan sleef sleef f(log) f(log)
atan2 sleef sleef unsupported unsupported
cos +((ppc64le:avx_mathfun) ) sleef -(both) -(both)
cosh f(exp) -(both) -(both)
erf sleef sleef unsupported unsupported
erfc sleef sleef unsupported unsupported
erfinv - (both) - (both) unsupported unsupported
exp + sleef - (x86:f()) - (x86:f())
expm1 f(exp) sleef unsupported unsupported
lgamma sleef sleef
log + sleef -(both) -(both)
log10 f(log) sleef f(log) f(log)
log1p f(log) sleef unsupported unsupported
log2 f(log) sleef f(log) f(log)
pow + f(exp) sleef -(both) -(both)
sin +((ppc64le:avx_mathfun) ) sleef -(both) -(both)
sinh f(exp) sleef -(both) -(both)
tan sleef sleef -(both) -(both)
tanh f(exp) sleef -(both) -(both)
hypot sleef sleef -(both) -(both)
nextafter sleef sleef -(both) -(both)
fmod sleef sleef -(both) -(both)

Vec256 Test cases Pr #42685
Current list:

  • Blends
  • Memory: UnAlignedLoadStore
  • Arithmetics: Plus,Minu,Multiplication,Division
  • Bitwise: BitAnd, BitOr, BitXor
  • Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual
  • MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp
  • SignManipulation: Absolute, Negate
  • Interleave: Interleave, DeInterleave
  • Rounding: Round, Ceil, Floor, Trunc
  • Mask: ZeroMask
  • SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal
  • Trigonometric: Sin, Cos, Tan
  • Hyperbolic: Tanh, Sinh, Cosh
  • InverseTrigonometric: Asin, ACos, ATan, ATan2
  • Logarithm: Log, Log2, Log10, Log1p
  • Exponents: Exp, Expm1
  • ErrorFunctions: Erf, Erfc, Erfinv
  • Pow: Pow
  • LGamma: LGamma
  • Quantization: quantize, dequantize, requantize_from_int
  • Quantization: widening_subtract, relu, relu6
    Missing:
  • Constructors, initializations
  • Conversion , Cast
  • Additional: imag, conj, angle (note: imag and conj only checked for float complex)

Notes on tests and testing framework

  • some math functions are tested within domain range
  • mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions.
  • some functions are tested against the local version. For example, std::round and vector version of round differs. so it was tested against the local version
  • round was tested against pytorch at::native::round_impl. for double type on Vsx vec_round failed for (even)+0 .5 values . it was solved by using vec_rint
  • complex types are not tested After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain
  • quantizations are not tested Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions
  • the testing framework should be improved further
  • For now -DBUILD_MOBILE_TEST=ON will be used for Vec256Test too
    Vec256 Test cases will be built for each CPU_CAPABILITY

@quickwritereader
Copy link
Copy Markdown
Contributor Author

#27678
#15676

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Jul 16, 2020

💊 CI failures summary and remediations

As of commit 97c8e17 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 162 times.

@quickwritereader quickwritereader force-pushed the vsx_initial_support_issue27678 branch from 4925d8f to ccab406 Compare July 16, 2020 17:36
@smessmer smessmer added module: vectorization Related to SIMD vectorization, e.g., Vec256 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 16, 2020
@quickwritereader quickwritereader force-pushed the vsx_initial_support_issue27678 branch 6 times, most recently from 19959fe to 84f7d05 Compare July 17, 2020 02:30
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jul 28, 2020

@smessmer in the future, please don't mark an OSS PR as triaged without also assigning a reviewer for it

@ezyang ezyang removed the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 28, 2020
@ezyang ezyang requested a review from VitalyFedyunin July 28, 2020 14:35
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jul 28, 2020

Not to sure who should be the shepherd for this. @VitalyFedyunin do you have any ideas?

@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 31, 2020
@quickwritereader
Copy link
Copy Markdown
Contributor Author

New modifications on Vec256Tests:

  • Absolute tests are tested against local_abs. For the CPU_CAPABILITY_DEFAULT both complex and real will be redirected to std::abs
  • LGamma will be checked with error epsilon so that it can pass tests on x86
  • for AVX2 and VSX: DeQuantize tests are tested with __builtin_fmaf if the compiler is GNU. For other compilers it will be tested with low precision error epsilon 1e-3

@quickwritereader
Copy link
Copy Markdown
Contributor Author

For now AVX2 and VSX both fails on these tests
[ FAILED ] InverseTrigonometric/2.Asin, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex >
[ FAILED ] InverseTrigonometric/2.ACos, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex >
[ FAILED ] InverseTrigonometric/2.ATan, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex >
[ FAILED ] InverseTrigonometric/3.Asin, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex >
[ FAILED ] InverseTrigonometric/3.ACos, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex >
[ FAILED ] InverseTrigonometric/3.ATan, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex >

Additionally VSX fails on Multiplication because of precision
[ FAILED ] Arithmetics/2.Multiplication, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex >

@quickwritereader
Copy link
Copy Markdown
Contributor Author

quickwritereader commented Aug 3, 2020

Failures above are precision related and tests can be checked within the domain and with low precision.
But ATan for both VSX and AVX has sign related error. As VSX complex is simply a translation of AVX codes for complex numbers:

[ RUN      ] InverseTrigonometric/2.ATan
Total Trial Count:131070
Domain:
{ -10, 10 }
Error epsilon: 1e-05
../../../aten/src/ATen/test/Vec256Test.h:491: Failure
Expected equality of these values:
  nearlyEqual(exp.real(), act.real(), absErr.real())
    Which is: false
  true
-1.5707963705062866 1.5707963705062866
atan: {
vec[(-5.22354,-4), (-4.52348,2.63532), (-5.75149,0.258563), (-0,-1.43798)]
vec_exp:vec[(-1.44969,-0.0913255), (-1.40576,0.0938578), (-1.39898,0.00757274), (-1.5708,-0.858373)]
vec_act:vec[(-1.44969,-0.0913255), (-1.40576,0.0938578), (-1.39898,0.00757276), (1.5708,-0.858373)]
}
[  FAILED  ] InverseTrigonometric/2.ATan, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex<float> > (34 ms)
[----------] 1 test from InverseTrigonometric/2 (34 ms total)

[----------] 1 test from InverseTrigonometric/3, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex<double> >
[ RUN      ] InverseTrigonometric/3.ATan
Total Trial Count:131070
Domain:
{ -10, 10 }
Error epsilon: 1e-05
../../../aten/src/ATen/test/Vec256Test.h:460: Failure
Expected equality of these values:
  nearlyEqual(exp.real(), act.real(), absErr.real())
    Which is: false
  true
1.5707963267948966 -1.5707963267948966
atan: {
vec[(-5,2.47906), (0,6.06085)]
vec_exp:vec[(-1.41065,0.0777398), (1.5708,0.166516)]
vec_act:vec[(-1.41065,0.0777398), (-1.5708,0.166516)]
}
[  FAILED  ] InverseTrigonometric/3.ATan, where TypeParam = at::vec256::(anonymous namespace)::Vec256<c10::complex<double> > (1 ms)

@quickwritereader quickwritereader changed the title Vsx initial support issue27678 Vsx initial support issue27678 and Vec256 tests Aug 3, 2020
@quickwritereader
Copy link
Copy Markdown
Contributor Author

@VitalyFedyunin if possible, could we discuss my Pr this week
thanks

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Aug 4, 2020

Do we know anyone at IBM who could help review this?

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Aug 4, 2020

@xuhdev / @zasdfgbnm is this the sort of thing either of you would be interested in? :)

@zasdfgbnm
Copy link
Copy Markdown
Collaborator

I know very little about CPU vectorization :)

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

@VitalyFedyunin if possible, could we discuss my Pr this week
thanks

hello, I have little PowerPC experience, but given time I can potentially review and land this proposal.

Several notes so far:

  1. It makes sense to separate tests PR and VSX support PRs (I advice to use ghstack )
  2. aten/src/ATen/cpu/vec256/vec256.h becomes messy with the mix of VSX and AVX perhaps we need to postfix everything properly as they are mutually exclusive.
  3. Do you have any benchmark numbers to prove that adding VSX instructions performs better than compiler optimization?

@quickwritereader
Copy link
Copy Markdown
Contributor Author

quickwritereader commented Aug 4, 2020

@VitalyFedyunin if possible, could we discuss my Pr this week
thanks

hello, I have little PowerPC experience, but given time I can potentially review and land this proposal.

Several notes so far:

  1. It makes sense to separate tests PR and VSX support PRs (I advice to use ghstack )
  2. aten/src/ATen/cpu/vec256/vec256.h becomes messy with the mix of VSX and AVX perhaps we need to postfix everything properly as they are mutually exclusive.
  3. Do you have any benchmark numbers to prove that adding VSX instructions performs better than compiler optimization?
  1. I'll see if I can separate them properly with the suggested tool.

  2. there are very few additions and modifications in vec256.h and as well as in other vector files
    in vec256.h only 1 common header was included (I suffixed it with _vsx) and stream helpers added for quant types.'

  3. No, I did not add any benchmark. I was asked by IBM to add initial support for VSX instructions as it is the same x86.
    I only checked with Godbolt's output to see if I am getting a decent output. I am emulating 256bit with two 128bit vectors. It is still vector intrinsics so it will mostly outperform and at least will not have broken auto-vectorizations in loops.

  4. some additional notes on performance and math functions:
    I implemented very few math functions as it is initial support and I could not decide to rely on Sleef or which implementations to use. But Sleef codes do not have 256bit simulation for VSX.
    There is a problem with non-implemented std map-ped functions. as with the current libm almost all math functions are not vectorizable. I can make them auto-vectorizable on x86 GCC assuming it will hold for Power architecture in the future. For gcc It needs --ffast-math flag addition and a little modification on map function so that auto-vectorization will not be broken.

    If this project relied on auto-vectorization and its report tools there would not be any need. We went this way for my team. We made a reporting tool for the project and we check to see where vectorization fails and we help the compiler. But there could be places its almost impossible to make it easier for the compiler to auto vectorize it.
    But this project relies on separate Vec256 so why not to use explicit vector types or intrinsic. From my experience compilers are still doing good with explicit codes.

Do we know anyone at IBM who could help review this?

@edelsohn or @ceseo could ask some IBM-ers to help us

@quickwritereader
Copy link
Copy Markdown
Contributor Author

#41541 (comment)
no, I did not mean CRTP static polymorphism for sin like ops. I just meant to separate all such ops. we could implement some sin like ops using those types. without fully caring for architecture. And for PPC, it would make things concise
Well in everything there are cons and pros. It is initial support so there is always room for improvements

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

Hello @Flamefire, we are looking to rewrite (and remove massive duplication) Vec code (eventually to support AVX512 and various vec sizes and architectures). So far we are happy with the code as is.

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

@quickwritereader please scroll up for CLA information, without it I cannot import your code to run internal tests.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Jun 8, 2021
Summary:
This fixes multiple bugs introduced by the VSX optimized code in #41541

- min/max/clamp now consistently return nan when any value is NaN as on other architectures
- The non-complex angle functions return PI for negative values now
- The complex angle functions have been corrected and optimized
- The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64

Fixes #59248
Fixes #57537

Pull Request resolved: #59382

Reviewed By: jbschlosser

Differential Revision: D28944626

Pulled By: ezyang

fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
This fixes multiple bugs introduced by the VSX optimized code in pytorch#41541

- min/max/clamp now consistently return nan when any value is NaN as on other architectures
- The non-complex angle functions return PI for negative values now
- The complex angle functions have been corrected and optimized
- The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64

Fixes pytorch#59248
Fixes pytorch#57537

Pull Request resolved: pytorch#59382

Reviewed By: jbschlosser

Differential Revision: D28944626

Pulled By: ezyang

fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
pytorchmergebot pushed a commit that referenced this pull request Aug 3, 2022
This fixes the remaining bug introduced by the VSX optimized code in #41541

Followup to #59382

### Description

The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail.

### Testing

Build and ran tests on PPC
Pull Request resolved: #82646
Approved by: https://github.com/ezyang
facebook-github-bot pushed a commit that referenced this pull request Aug 4, 2022
Summary:
This fixes the remaining bug introduced by the VSX optimized code in #41541

Followup to #59382

### Description

The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail.

### Testing

Build and ran tests on PPC

Pull Request resolved: #82646
Approved by: https://github.com/ezyang

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/39ffad392c49aafcfeba05e2704bb1b666247471

Reviewed By: kit1980

Differential Revision: D38395263

fbshipit-source-id: c4c56af2d8e3b528b6418a4a32c63de77037e5cf
pytorchmergebot pushed a commit to Flamefire/pytorch that referenced this pull request Oct 10, 2022
Replace the remaining hand-written code in vec256_float_vsx.h by calls
to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541
Also remove some whitespace wrongly added in the above PRs.

This fixes wrong results for e.g. `sin(1e20)`.
Fixes pytorch#85978
Flamefire added a commit to Flamefire/pytorch that referenced this pull request Nov 22, 2022
Replace the remaining hand-written code in vec256_float_vsx.h by calls
to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541
Also remove some whitespace wrongly added in the above PRs.

This fixes wrong results for e.g. `sin(1e20)`.
Fixes pytorch#85978
pytorchmergebot pushed a commit that referenced this pull request Nov 22, 2022
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in #59382 & #82646 after #41541

This fixes wrong results for e.g. `sin(1e20)`.
Fixes #85978

To fix #85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done.

Pull Request resolved: #86453
Approved by: https://github.com/malfet
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541

This fixes wrong results for e.g. `sin(1e20)`.
Fixes pytorch#85978

To fix pytorch#85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done.

Pull Request resolved: pytorch#86453
Approved by: https://github.com/malfet
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
### Pytorch Vec256 ppc64le support
implemented types:

- double
- float
- int16
- int32
- int64
- qint32
- qint8
- quint8
- complex_float
- complex_double

Notes:
All basic vector operations are implemented:
There are a few problems:
- minimum maximum nan propagation for ppc64le is missing and was not checked
- complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones.  That's why they were either excluded or tested in smaller domain range
- precisions of the implemented float math functions

~~Besides, I added CPU_CAPABILITY for power. but as because of  quantization errors for DEFAULT I had to undef and  use vsx for DEFAULT too~~

#### Details
##### Supported math functions

+ plus sign means vectorized, -  minus sign means missing,   (implementation notes are added inside braces)
(notes). Example: -(both ) means it was also missing on x86 side
g( func_name)  means vectorization is using func_name
sleef - redirected to the Sleef
unsupported

function_name | float | double | complex float | complex double
|-- | -- | -- | -- | --|
acos | sleef | sleef | f(asin) | f(asin)
asin | sleef | sleef | +(pytorch impl) | +(pytorch impl)
atan | sleef | sleef | f(log) | f(log)
atan2 | sleef | sleef | unsupported | unsupported
cos | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
cosh | f(exp)   | -(both) | -(both) |
erf | sleef | sleef | unsupported | unsupported
erfc | sleef | sleef | unsupported | unsupported
erfinv | - (both) | - (both) | unsupported | unsupported
exp | + | sleef | - (x86:f()) | - (x86:f())
expm1 | f(exp)  | sleef | unsupported | unsupported
lgamma | sleef | sleef |   |
log | +  | sleef | -(both) | -(both)
log10 | f(log)  | sleef | f(log) | f(log)
log1p | f(log)  | sleef | unsupported | unsupported
log2 | f(log)  | sleef | f(log) | f(log)
pow | + f(exp)  | sleef | -(both) | -(both)
sin | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
sinh | f(exp)  | sleef | -(both) | -(both)
tan | sleef | sleef | -(both) | -(both)
tanh | f(exp)  | sleef | -(both) | -(both)
hypot | sleef | sleef | -(both) | -(both)
nextafter | sleef  | sleef | -(both) | -(both)
fmod | sleef | sleef | -(both) | -(both)

[Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685)
Current list:

- [x] Blends
- [x] Memory: UnAlignedLoadStore
- [x] Arithmetics: Plus,Minu,Multiplication,Division
- [x] Bitwise: BitAnd, BitOr, BitXor
- [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual
- [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp
- [x] SignManipulation: Absolute, Negate
- [x] Interleave: Interleave, DeInterleave
- [x] Rounding: Round, Ceil, Floor, Trunc
- [x] Mask: ZeroMask
- [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal
- [x] Trigonometric: Sin, Cos, Tan
- [x] Hyperbolic: Tanh, Sinh, Cosh
- [x] InverseTrigonometric: Asin, ACos, ATan, ATan2
- [x] Logarithm: Log, Log2, Log10, Log1p
- [x] Exponents: Exp, Expm1
- [x] ErrorFunctions: Erf, Erfc, Erfinv
- [x] Pow: Pow
- [x] LGamma: LGamma
- [x] Quantization: quantize, dequantize, requantize_from_int
- [x] Quantization: widening_subtract, relu, relu6
Missing:
- [ ] Constructors, initializations
- [ ] Conversion , Cast
- [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex)

#### Notes on tests and testing framework
- some math functions are tested within domain range
- mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions.
- some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~
- round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx  vec_round failed  for  (even)+0 .5 values**~~ . it was solved by using vec_rint
- ~~**complex types are not tested**~~  **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain**
- ~~quantizations are not tested~~  Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions
- the testing framework should be improved further
- ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~
Vec256 Test cases will be built for each CPU_CAPABILITY

Pull Request resolved: pytorch#41541

Reviewed By: zhangguanheng66

Differential Revision: D23922049

Pulled By: VitalyFedyunin

fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Summary:
This fixes multiple bugs introduced by the VSX optimized code in pytorch#41541

- min/max/clamp now consistently return nan when any value is NaN as on other architectures
- The non-complex angle functions return PI for negative values now
- The complex angle functions have been corrected and optimized
- The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64

Fixes pytorch#59248
Fixes pytorch#57537

Pull Request resolved: pytorch#59382

Reviewed By: jbschlosser

Differential Revision: D28944626

Pulled By: ezyang

fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
This fixes the remaining bug introduced by the VSX optimized code in pytorch#41541

Followup to pytorch#59382

### Description

The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail.

### Testing

Build and ran tests on PPC
Pull Request resolved: pytorch#82646
Approved by: https://github.com/ezyang
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541

This fixes wrong results for e.g. `sin(1e20)`.
Fixes pytorch#85978

To fix pytorch#85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done.

Pull Request resolved: pytorch#86453
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: vectorization Related to SIMD vectorization, e.g., Vec256 open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.