keepdims should always preserve all dimensions in CUB-based reductions#2725
keepdims should always preserve all dimensions in CUB-based reductions#2725emcastillo merged 13 commits intocupy:masterfrom
Conversation
|
Benchmarks regarding the use of import numpy as np
from cupy.cuda.cub import _contig_axes
axes=(1, 2)
%timeit np.all(np.diff(axes) == 1)
# alternative Python-based solution considered, but not implemented in this PR
%timeit all((ax - ax_prev) == 1 for ax, ax_prev in zip(axes[1:], axes[:-1]))
%timeit _contig_axes(axes)
|
0b7b284 to
c8c06dc
Compare
|
In a follow-up PR, we can easily add a CUB-based path for cdef ndarray _ndarray_mean(ndarray self, axis, dtype, out, keepdims):
if cupy.cuda.cub_enabled and self.size != 0:
result = cub.cub_reduction(self, cub.CUPY_CUB_SUM, axis, out, dtype,
keepdims)
if result is not None:
result /= (self.size / result.size)
return result
return _mean(self, axis=axis, dtype=dtype, out=out, keepdims=keepdims) |
leofang
left a comment
There was a problem hiding this comment.
LGTM except for a few cleanups and nitpicks. The new refactoring makes it more readable and maintainable!
8af729b to
ce0f9b8
Compare
moved use of _get_axis to improve consistency and reduce repeated axis handling
ce0f9b8 to
f2060ce
Compare
|
Thanks for reviewing @leofang. I had forgotten to get back to this one. It has been rebased and should be ready for testing now. |
|
Jenkins, test this please |
|
Successfully created a job for commit 9497105: |
|
Jenkins CI test (for commit 9497105, target branch master) succeeded! |
|
Jenkins, test this please |
|
Successfully created a job for commit a4bba42: |
|
Jenkins CI test (for commit a4bba42, target branch master) succeeded! |
keepdims should always preserve all dimensions in CUB-based reductions
closes #2720
The bug is fixed by the lines computing
out_shapewithindevice_reduce.In order to avoid multiple calls to
_get_axis, I moved the_get_axiscall outside of_preprocess_arrayand call it directly in_routines_math.pyxand_routines_statistics.pyxinstead. The duplicate logic could potentially be refactored into a common function_get_axeswill have already sorted the axes in ascending order and checked their range (It also separates out the axes to be reduced from those needed in determining the shape for keepdims).This allows greatly simplifying the code in
_cub_device_segmented_reduce_axis_compatibleand removes the need for_cub_device_reduce_axis_compatiblealtogether.Although not strictly part of the bug fix, I implemented a performance improvement in the segmented reduction case by replacing the following slow call
with a call to a new
_contig_axisCython helper function