BUG, SIMD: Fix invalid value encountered in several ufuncs#22771
BUG, SIMD: Fix invalid value encountered in several ufuncs#22771seberg merged 7 commits intonumpy:mainfrom
Conversation
351397f to
65fa37e
Compare
981816d to
267e02e
Compare
0b7a4fd to
45b56f8
Compare
45b56f8 to
a540f50
Compare
a540f50 to
b5a4e2f
Compare
b5a4e2f to
5bf0e44
Compare
|
The failure on ppc64le looks legit. |
|
To save some clicks: the error is here and is when checking that using |
Providing non-signaling comparison intrinsics that guarantee no FP invalid exception in case of qNaN sounds great but it cost unacceptable extra intrinsics on ppc64le(VSX) and x86(SSE). Therefore, an integer definition #NPY_SIMD_CMPSIGNAL has been provided instead to differenate between SIMD extensions that support only supports signaling comparison.
6d22737 to
d02fc70
Compare
|
The Power/ISA guide wasn't clear to me, I thought the legacy AltiVec FP comparison instructions are not going to raise invalid FP exceptions for quite nans. However, I discarded the whole patch that related to implementing non-signaling fp comparison, it was a bad approach from my side, see the last commit message for more clarification. |
|
close/open retrigger Travis |
seberg
left a comment
There was a problem hiding this comment.
Logic looks fine, the new define also seems like a good choice to me. The rounding changes are hard to follow but the logic looks sound.
There are pretty extensive tests for floor, ceil, etc. in the SIMD tests, so I think this is good.
PS: I am not immediately sure how well weird corner cases are covered (especially for float32, like np.nextafter(2, 0)), but it seems unlikely to be an issue. We also seem to not have any integration tests, but that would be feature creep here.
| npyv_b@len@ nnan_mask = npyv_notnan_@sfx@(x); | ||
| npyv_@sfx@ x_exnan = npyv_select_@sfx@(nnan_mask, x, npyv_setall_@sfx@(@default_val@)); | ||
| npyv_@sfx@ out = __svml_@func@@func_suffix@(x_exnan); | ||
| out = npyv_select_@sfx@(nnan_mask, out, x); |
There was a problem hiding this comment.
Oh an SVML issue, interesting...
| // a if |a| >= 2^23 or a == NaN | ||
| npyv_u32 mask = vcleq_f32(abs_x, two_power_23); | ||
| mask = vandq_u32(mask, nnan_mask); | ||
| return vbslq_f32(mask, floor, a); |
There was a problem hiding this comment.
Logic looks right (didn't try to look make sure the corner cases are, the tests surely do). I guess this approach to floor is just a bit faster than then the casting approach (with the dance necessary to mask large values)?
There was a problem hiding this comment.
I guess this approach to floor is just a bit faster than then the casting
To avoid testing finite then yes.
necessary to mask large values
To avoid divergences caused by sub/add
|
|
||
| data_cmp = [func(a, b) for a, b in zip(data_a, data_b)] | ||
| cmp = to_bool(intrin(vdata_a, vdata_b)) | ||
| assert cmp == data_cmp |
| )) | ||
| def test_unary_spurious_fpexception(self, ufunc, dtype, data, escape): | ||
| if escape and ufunc in escape: | ||
| return |
There was a problem hiding this comment.
Should this be pytest.xfail() or pytest.skip()? That way there is a very small chance to notice it once day again.
There was a problem hiding this comment.
Should this be pytest.xfail() or pytest.skip()?
It's normal to raise this kind of domain/invalid fp exception but if you mean the second case that related to float16 then yes.
|
Hmmmpf, tried to cancel and start the Travis CI ppc64le job, but right now it doesn't seem to help to get it started... |
|
close/reopen |
|
OK, CI ran through now, lets just put it in, can follow-up, but I doubt there is anything (except removing SSE2 without SSE41 to simplify maintanence ;)). Thanks Sayed! |
|
@seiko2plus we should to follow up on this for 1.24.1 since this is backported. I am getting these on M1: Unless the fix is very simple, I suspect xfailing the test on apple is just as well, though. |
I need some time to think about the consequences of this move, SSE2 is still part of our default baseline, and dropping it will require dispatching each SIMD kernel.
I will look into it, it would be great if you could provide a build log or verbose test result at least. |
|
These are all the If you really want, here is the build log: Details |
|
@seberg, I wasn't able to reproduce it, maybe the clang commits an aggressive optimization leads somehow to optimizing out the new changes: numpy/numpy/core/src/umath/loops_trigonometric.dispatch.c.src Lines 128 to 131 in d02fc70 or maybe your build was cached as I can see from your build log: NFO: customize UnixCCompiler using new_build_clib
INFO: CCompilerOpt.__init__[813] : load cache from file -> /Users/sebastianb/forks/numpy/build/temp.macosx-11.0-arm64-3.10/ccompiler_opt_cache_clib.py
INFO: CCompilerOpt.__init__[824] : hit the file cache
running build_ext
INFO: customize UnixCCompiler
INFO: customize UnixCCompiler using new_build_ext
INFO: CCompilerOpt.__init__[813] : load cache from file -> /Users/sebastianb/forks/numpy/build/temp.macosx-11.0-arm64-3.10/ccompiler_opt_cache_ext.py
INFO: CCompilerOpt.__init__[824] : hit the file cache
INFO: customize UnixCCompiler
WARN: #### ['arm64-apple-darwin20.0.0-clang', '-Wno-unused-result', '-Wsign-compare', '-Wunreachable-code', '-DNDEBUG', '-fwrapv', '-O2', '-Wall', '-fPIC', '-O2', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include', '-arch', 'arm64', '-fPIC', '-O2', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include', '-arch', 'arm64', '-ftree-vectorize', '-fPIC', '-fPIE', '-fstack-protector-strong', '-O2', '-pipe', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include', '-D_FORTIFY_SOURCE=2', '-isystem', '/opt/homebrew/Caskroom/mambaforge/base/include'] #######
INFO: customize UnixCCompiler using new_build_extCould you please provide a non-cached build log and version of clang? |
Hmmm, maybe the compiler got updated recently with Ventura? (I am seeing this on main branch, but I didn't think it would matter). build.log |
closes #22461, #22772, #22797
for more clarification check the linked issues above