Skip to content

Discussion for possible enhancements of the new CUB support #2519

@leofang

Description

@leofang

With the great effort by @anaruse in #2090, I've seen encouraging performance boosts. Below is a list of possible improvements I can think of, either for offering extensive support or for enabling even more boost. I am interested in knowing what I've missed or misunderstood.

if users set up a context manager like this

with stream:
    arr.sum()
    # do other stuff

The non-default stream should be honored. All of the CUB functions introduced in #2090 support an optional stream argument. We just need to pick up the current stream pointer during setup and modify the wrappers.

currently they are all Python def functions. Could be be beneficial for performance. In particular, if we don't want to expose those wrappers to end users, cdef would be a nice choice.

currently only a full reduction is supported, but if a reduction over the last axes of a contiguous array of shape, say, (X, Y, Z), is needed, this seems possible with a naive loop over the remaining axes. In other words, in this case we can use CUB to do arr.sum(axis=2) or arr.sum(axis=(1,2)), assuming arr is C contiguous. This resembles the current treatment of PlanNd in the FFT module.

Question: (from #2508 (comment)): is Jenkins configured to test CUB functionalities? UPDATE: No, see #2538 (comment).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions