Skip to content

Stream in the context-manager form is not used in ElementwiseKernel or ReductionKernel #2524

@leofang

Description

@leofang

This is actually a bug reported back in #1695 that unfortunately went unnoticed.

In examples/stream/map_reduce.py, a list of streams was created for executing cupy.matmul() in parallel, which is backed by a ReductionKernel in this case:

cupy/cupy/core/core.pyx

Lines 2513 to 2516 in 1af22f5

cdef _tensordot_core_mul_sum = ReductionKernel(
'S x, T y', 'U out',
'static_cast<U>(x) * static_cast<U>(y)',
'a + b', 'out = a', '0', '_tensordot_core_mul_sum')

However, inspecting the implementation I found that ReductionKernel only accepts an explicit stream argument; it does not pick up any current stream:
stream = kwargs.pop('stream', None)

In other words, that example was misleading because those streams were not used at all and so all executions were serialized, as can be checked from nvprof + nvvp (see the circle in red):
螢幕快照 2019-10-03 上午11 24 27

The same bug also appears in ElementwiseKernel:

stream = kwargs.pop('stream', None)

In my opinion, unlike RawKernel which is not used by any CuPy core functionalities, ElementwiseKernel and ReductionKernel should honor the current stream by checking the current stream pointer if no stream argument is explicitly given, since many CuPy functions like cupy.matmul() do not support passing in a stream. A similar approach is already adopted in the FFT module, see #2362.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions