ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays#27519
ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays#27519eendebakpt wants to merge 23 commits intonumpy:mainfrom
Conversation
seberg
left a comment
There was a problem hiding this comment.
A few comments. I'll ask for more style cleanup before we merge this.
| PyArrayObject* original_array = self; | ||
| if (PyArray_BASE(self) != NULL) { | ||
| original_array = (PyArrayObject*) PyArray_BASE(self); | ||
| } |
There was a problem hiding this comment.
This doesn't make any sense to me. What does the base have to do with anything here?
There was a problem hiding this comment.
I agree. Removed.
| if (PyArray_BASE(self) != NULL) { | ||
| original_array = (PyArrayObject*) PyArray_BASE(self); | ||
| } | ||
| int bytes_not_swapped = PyArray_ISNOTSWAPPED(self) && PyArray_ISNOTSWAPPED(original_array); |
There was a problem hiding this comment.
Checking for byte-swap is (just like in the other PR) important. Are there tests that fail without it? Because if not, there should be a new test for it (just like the one in the other PR which is currently failing).
| return nonzero_count; | ||
| } | ||
|
|
||
| static inline void nonzero_idxs_1D_bool(npy_intp count, npy_intp nonzero_count, char *data, npy_intp stride, npy_intp* multi_index) |
There was a problem hiding this comment.
I'll have formatting nits, but this is nice to move out!
| ++j; | ||
| } | ||
| } | ||
| nonzero_idxs_1D_bool(count, nonzero_count, data, stride, multi_index); |
There was a problem hiding this comment.
Doesn't this call belong to the family of fast functions in the other place? I.e. only the manual loop of nonzero should stay here?
There was a problem hiding this comment.
I wanted to keep the diff short, but it indeed fits a bit better in the other block. Changed.
| int M_type_num = dtype->type_num; | ||
|
|
||
| bool executed = nonzero_idxs_dispatcher((void*)data, multi_index, M_dim, | ||
| M_shape, M_strides, M_type_num, nonzero_count); |
There was a problem hiding this comment.
This branch is missing the thresholded GIL release (internals of it I guess).
There was a problem hiding this comment.
Releasing the GIL (or running the cpython free-threading build) could result in a buffer overflow if another thread changes the number of non-zero entries in the array. I added a guard for that, but we should rerun the benchmarks (I still expect a good speedup though).
Should a test be added for this? (probably here: https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_multithreading.py)
There was a problem hiding this comment.
Interesting, a case where we would actually want to lock down a whole array (but I think this is super hard with buffer exchange, so in practice...).
Would be very cool if you add a test. Larger use of free-threading is in the future after-all, and maybe not a too distant one.
|
@seberg Could you review this one? |
Improve performance of
np.non_zerofor 1D and 2D contiguous arrays. The main cases improved are 2D boolean arrays and 1D/2D int and float arrays. The PR is based on work and ideas from @touqir14 in #18368Benchmark on main:
Benchmark on PR:
Benchmark script
Notes: