Add CUB support for `argmax()` and `argmin()` by leofang · Pull Request #2596 · cupy/cupy

leofang · 2019-11-01T17:05:55Z

This PR is made easy based on the refactoring in #2562. Part of #2519.

Notes:

Currently an explicit axis argument is not supported. In fact, I am a bit reluctant to add CUB support for it because of two reasons:
- According to the NumPy behavior axis can only be an integer, not a tuple (see Signatures and behaviors of argmax and argmin are incompatible with NumPy #2595), meaning that only the special case axis=-1 (searching over the last axis) can be benefited by device_segmented_reduce added in Refactor CUB to support an explicit axis argument; Fix alignments for Thrust's complex types #2562, which doesn't seem worth the time...
- On the technical side, the output of device_segmented_reduce would be an array of key-value pairs. I am not sure what's the best way to retrieve the keys (i.e. the wanted array indices). Seems like I need one extra kernel launch to loop data over and copy keys to another device array? Any comment or suggestion is welcome, as I think the core devs should have some experience for how to handle it.
  (You already did this in _argmax and _argmin, although I don't fully understand how it works there.)
The implementation already has the NumPy compatibility in mind (Signatures and behaviors of argmax and argmin are incompatible with NumPy #2595).

leofang · 2019-11-01T17:13:05Z

As always, below is a performance test on a K40.

Script:

import cupy as cp


n_runs = 10
shape = (512, 256, 256)
axis_cases = [(0, 1, 2),]   # dummy

for dtype in (cp.int64, cp.float32, cp.float64, cp.complex64, cp.complex128):
    if dtype in (cp.float32, cp.float64):
        x = cp.random.random(shape, dtype=dtype)
    elif dtype in (cp.int32, cp.int64):
        x = cp.random.randint(0, 10, size=shape, dtype=dtype)
    else:
        x = cp.random.random(shape).astype(dtype) + 1j * cp.random.random(shape).astype(dtype)
    x_np = cp.asnumpy(x) #move to cpu

    for axis in axis_cases:
        for func in ('argmax', 'argmin'):
                keepdims = False
                print("testing", axis, "+", str(dtype), "+", "keepdims={}".format(keepdims), "+", func, "...")
                start = cp.cuda.Event()
                end = cp.cuda.Event()

                cp.cuda.cub_enabled = False
                w = None
                start.record()
                for i in range(n_runs):
                    w = getattr(x, func)()
                end.record()
                end.synchronize()
                t_cp_disabled = cp.cuda.get_elapsed_time(start, end)

                cp.cuda.cub_enabled = True
                y = None
                start.record()
                for i in range(n_runs):
                    y = getattr(x, func)()
                end.record()
                end.synchronize()
                t_cp_enabled = cp.cuda.get_elapsed_time(start, end)

                z = None
                start.record()
                for i in range(n_runs):
                    z = getattr(x_np, func)()
                end.record()
                end.synchronize()
                t_np = cp.cuda.get_elapsed_time(start, end)

                print("CUB enabled: {}, CUB disabled: {}, numpy: {} (ms for {} runs)\n".format(t_cp_enabled, t_cp_disabled, t_np, n_runs))

                try:
                    assert cp.allclose(w, y)
                except AssertionError:
                    print("**************** RESULTS DO NOT MATCH: CUB & reduction ****************")
                    print(w, y)
                try:
                    assert cp.allclose(y, z)
                except AssertionError:
                    print("**************** RESULTS DO NOT MATCH: CUB & NumPy ****************")
                    print(y, z)
        print()

Result:

testing (0, 1, 2) + <class 'numpy.int64'> + keepdims=False + argmax ...
CUB enabled: 14.8439359664917, CUB disabled: 562.7112426757812, numpy: 274.7495422363281 (ms for 10 runs)

testing (0, 1, 2) + <class 'numpy.int64'> + keepdims=False + argmin ...
CUB enabled: 14.812224388122559, CUB disabled: 562.5385131835938, numpy: 225.92294311523438 (ms for 10 runs)


testing (0, 1, 2) + <class 'numpy.float32'> + keepdims=False + argmax ...
CUB enabled: 7.563583850860596, CUB disabled: 475.1551818847656, numpy: 200.45379638671875 (ms for 10 runs)

testing (0, 1, 2) + <class 'numpy.float32'> + keepdims=False + argmin ...
CUB enabled: 7.555967807769775, CUB disabled: 475.4429931640625, numpy: 202.73033142089844 (ms for 10 runs)


testing (0, 1, 2) + <class 'numpy.float64'> + keepdims=False + argmax ...
CUB enabled: 14.66256046295166, CUB disabled: 523.36669921875, numpy: 241.46902465820312 (ms for 10 runs)

testing (0, 1, 2) + <class 'numpy.float64'> + keepdims=False + argmin ...
CUB enabled: 14.680447578430176, CUB disabled: 523.625244140625, numpy: 237.07350158691406 (ms for 10 runs)


testing (0, 1, 2) + <class 'numpy.complex64'> + keepdims=False + argmax ...
CUB enabled: 15.278176307678223, CUB disabled: 626.9349975585938, numpy: 585.0951538085938 (ms for 10 runs)

testing (0, 1, 2) + <class 'numpy.complex64'> + keepdims=False + argmin ...
CUB enabled: 14.97152042388916, CUB disabled: 627.1948852539062, numpy: 589.4969482421875 (ms for 10 runs)


testing (0, 1, 2) + <class 'numpy.complex128'> + keepdims=False + argmax ...
CUB enabled: 29.66339111328125, CUB disabled: 679.6466674804688, numpy: 612.7395629882812 (ms for 10 runs)

testing (0, 1, 2) + <class 'numpy.complex128'> + keepdims=False + argmin ...
CUB enabled: 29.575040817260742, CUB disabled: 680.1937866210938, numpy: 651.1036987304688 (ms for 10 runs)

cupy/cuda/cub.pyx

emcastillo

LGTM

emcastillo · 2019-11-08T06:23:37Z

Jenkins, test this please

pfn-ci-bot · 2019-11-08T06:23:41Z

Successfully created a job for commit 534a2ea:

Dashboard for commit 534a2ea

chainer-ci · 2019-11-08T07:16:02Z

Jenkins CI test (for commit 534a2ea, target branch master) succeeded!

leofang added 4 commits October 31, 2019 22:43

this compiles but is not yet tested

a8716b8

device reduce for argmin and argmax works

1d68b97

temporarily removed WIP for device segmented reduce

8f053e3

ensure axis is None before device_reduce

81e051e

leofang mentioned this pull request Nov 1, 2019

Discussion for possible enhancements of the new CUB support #2519

Closed

10 tasks

silence flake8

a347333

leofang mentioned this pull request Nov 3, 2019

Add tests for cupy.cuda.cub #2598

Merged

asi1024 assigned emcastillo Nov 5, 2019

emcastillo added the cat:performance Performance in terms of speed or memory consumption label Nov 6, 2019

emcastillo added this to the v7.0.0 milestone Nov 6, 2019

emcastillo reviewed Nov 8, 2019

View reviewed changes

cupy/cuda/cub.pyx Outdated Show resolved Hide resolved

apply review

534a2ea

leofang force-pushed the cub_argmax_min branch from 5e23f36 to a347333 Compare November 8, 2019 04:40

emcastillo approved these changes Nov 8, 2019

View reviewed changes

emcastillo merged commit bb3ab7a into cupy:master Nov 11, 2019

leofang deleted the cub_argmax_min branch November 11, 2019 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CUB support for `argmax()` and `argmin()`#2596

Add CUB support for `argmax()` and `argmin()`#2596
emcastillo merged 6 commits intocupy:masterfrom
leofang:cub_argmax_min

leofang commented Nov 1, 2019 •

edited

Loading

Uh oh!

leofang commented Nov 1, 2019

Uh oh!

Uh oh!

emcastillo left a comment

Uh oh!

emcastillo commented Nov 8, 2019

Uh oh!

pfn-ci-bot commented Nov 8, 2019

Uh oh!

chainer-ci commented Nov 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

leofang commented Nov 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang commented Nov 1, 2019

Uh oh!

Uh oh!

emcastillo left a comment

Choose a reason for hiding this comment

Uh oh!

emcastillo commented Nov 8, 2019

Uh oh!

pfn-ci-bot commented Nov 8, 2019

Uh oh!

chainer-ci commented Nov 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leofang commented Nov 1, 2019 •

edited

Loading