BUG, SIMD: Fix memory overlap in ufunc comparison loops#22851
BUG, SIMD: Fix memory overlap in ufunc comparison loops#22851seberg merged 2 commits intonumpy:mainfrom
Conversation
7a887b6 to
2862a1c
Compare
|
Isn't the problem that the I guess this is fine as well, the more precise check probably doesn't matter in practice, but should we remove the macros if we don't expand the |
|
This is confusing with the additional |
|
Arg, sorry that was off, since the bool version deals with the output type being different. But, I do think something like this would also work: diff --git a/numpy/core/src/umath/fast_loop_macros.h b/numpy/core/src/umath/fast_loop_macros.h
index 5274957f7..973e47850 100644
--- a/numpy/core/src/umath/fast_loop_macros.h
+++ b/numpy/core/src/umath/fast_loop_macros.h
@@ -381,15 +381,25 @@ abs_ptrdiff(char *a, char *b)
#define IS_BLOCKABLE_BINARY_BOOL(esize, vsize) \
(steps[0] == (esize) && steps[0] == steps[1] && steps[2] == (1) && \
npy_is_aligned(args[1], (esize)) && \
- npy_is_aligned(args[0], (esize)))
+ npy_is_aligned(args[0], (esize)) && \
+ (abs_ptrdiff(args[2], args[0]) >= (vsize) || \
+ abs_ptrdiff(args[2], args[0]) == 0) && \
+ (abs_ptrdiff(args[2], args[1]) >= (vsize) || \
+ abs_ptrdiff(args[2], args[1]) >= 0))
#define IS_BLOCKABLE_BINARY_SCALAR1_BOOL(esize, vsize) \
(steps[0] == 0 && steps[1] == (esize) && steps[2] == (1) && \
- npy_is_aligned(args[1], (esize)))
+ npy_is_aligned(args[1], (esize)) && \
+ ((abs_ptrdiff(args[2], args[1]) >= (vsize)) || \
+ (abs_ptrdiff(args[2], args[1]) == 0)) && \
+ abs_ptrdiff(args[2], args[0]) >= (esize))
#define IS_BLOCKABLE_BINARY_SCALAR2_BOOL(esize, vsize) \
(steps[0] == (esize) && steps[1] == 0 && steps[2] == (1) && \
- npy_is_aligned(args[0], (esize)))
+ npy_is_aligned(args[0], (esize)) && \
+ ((abs_ptrdiff(args[2], args[0]) >= (vsize)) || \
+ (abs_ptrdiff(args[2], args[0]) == 0)) && \
+ abs_ptrdiff(args[2], args[1]) >= (esize))
/* align var to alignment */
#define LOOP_BLOCK_ALIGN_VAR(var, type, alignment)\In any case either seems fine to me if you have a preference for this version. But lets delete the faulty macro's if we go this route? |
|
I don't like to hide memory overlap tests behind macros, just one check at the beginning of the function and then using contiguous macros or going raw sounds safe and clean to me, we already follow this way for most of our dispatchable sources, therefore I deleted these macros since they are no longer used anywhere. |
|
Yeah, I can see a point in not hiding them in a macro that is used for other things especially. Lets put this in then, it is indeed a pattern used elsewhere. Thanks for the quick fix and sleuthing in what was wrong! |
| uf for uf in UFUNCS if uf.nin == 2 | ||
| ] | ||
| UFUNCS_BINARY_ACC = [ | ||
| uf for uf in UFUNCS_BINARY if hasattr(uf, "accumulate") and uf.nout == 1 |
There was a problem hiding this comment.
Btw. the hasattr shouldn't do anything right now. Checking here that ufunc.signature is None would make more sense to me.
But its mainly a nitpick and the rest is important for backporting.
There was a problem hiding this comment.
Thanks for clarification, in that case, the following skip can be avoided:
numpy/numpy/core/tests/test_umath.py
Lines 4356 to 4357 in 1a0ea79
closes #22841, relates #21483