Skip to content

Fixes #2118 : Setting CUDA function attributes#2120

Closed
andravin wants to merge 13 commits intocupy:masterfrom
andravin:function-attributes
Closed

Fixes #2118 : Setting CUDA function attributes#2120
andravin wants to merge 13 commits intocupy:masterfrom
andravin:function-attributes

Conversation

@andravin
Copy link
Copy Markdown
Contributor

Fixes #2118 .

Builds on @grlee77 commits to implement CUDA function attributes using descriptors.

Enables getting all function attributes and setting the max_dynamic_shared_size_bytes and preferred_shared_memory_carveout attributes.

Added property RawKernel.kernel to make access to the underlying kernel function re-usable.

grlee77 and others added 10 commits December 8, 2018 21:34
add a FuncAttributes class to store all kernel attributes

add an attributes property to the RawKernel class. This can be used to query the register
and memory usage of the kernel.
refactor attributes test as a separate test case

memoize the kernel attributes.
build the dictionary within raw.pyx instead
Moved RawKernel kernel generation into a property to facilitate re-use.
Fixed formatting.
Fixed python 2.x compatibility by making descriptors "new style" classes.
@leofang
Copy link
Copy Markdown
Member

leofang commented Mar 31, 2019

It should be noted that this PR is based on #1874.

@leofang
Copy link
Copy Markdown
Member

leofang commented Mar 31, 2019

Linking @andravin's rationale here: #2118 (comment), which makes sense I think.

Copy link
Copy Markdown
Member

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @andravin. I like the implementation and think a class is warranted gives a convenient way to allow setting the writeable attributes.

@hvy
Copy link
Copy Markdown
Member

hvy commented Apr 1, 2019

@kmaehashi could you take a look?

@leofang
Copy link
Copy Markdown
Member

leofang commented Apr 2, 2019

@andravin It'd be nice to document the usage of this feature somewhere. The current docstring

cupy/cupy/core/raw.pyx

Lines 54 to 62 in f86c365

@property
def attributes(self):
"""Returns an object containing runtime kernel attributes.
Returns:
attributes (FunctionAttributes): A python class containing the
kernel's attributes. For example, ``attributes.num_regs``
corresponds to the number of registers used by the kernel.
"""

is not clear about what can be read/set and how to set. Another possible place is
Raw kernels
-----------
Raw kernels can be defined by the :class:`~cupy.RawKernel` class.
By using raw kernels, you can define kernels from raw CUDA source.
:class:`~cupy.RawKernel` object allows you to call the kernel with CUDA's ``cuLaunchKernel`` interface.
In other words, you have control over grid size, block size, shared memory size and stream.
.. doctest::
>>> add_kernel = cp.RawKernel(r'''
... extern "C" __global__
... void my_add(const float* x1, const float* x2, float* y) {
... int tid = blockDim.x * blockIdx.x + threadIdx.x;
... y[tid] = x1[tid] + x2[tid];
... }
... ''', 'my_add')
>>> x1 = cupy.arange(25, dtype=cupy.float32).reshape(5, 5)
>>> x2 = cupy.arange(25, dtype=cupy.float32).reshape(5, 5)
>>> y = cupy.zeros((5, 5), dtype=cupy.float32)
>>> add_kernel((5,), (5,), (x1, x2, y)) # grid, block and arguments
>>> y
array([[ 0., 2., 4., 6., 8.],
[10., 12., 14., 16., 18.],
[20., 22., 24., 26., 28.],
[30., 32., 34., 36., 38.],
[40., 42., 44., 46., 48.]], dtype=float32)
.. note::
The kernel does not have return values.
You need to pass both input arrays and output arrays as arguments.
.. note::
No validation will be performed by CuPy for arguments passed to the kernel, including types and number of arguments.
Especially note that when passing :class:`~cupy.ndarray`, its ``dtype`` should match with the type of the argument declared in the method signature of the CUDA source code (unless you are casting arrays intentionally).
For example, ``cupy.float32`` and ``cupy.uint64`` arrays must be passed to the argument typed as ``float*`` and ``unsigned long long*``.
For Python primitive types, ``int``, ``float`` and ``bool`` map to ``long long``, ``double`` and ``bool``, respectively.
.. note::
When using ``printf()`` in your CUDA kernel, you may need to synchronize the stream to see the output.
You can use ``cupy.cuda.Stream.null.synchronize()`` if you are using the default stream.

A short description/example based on your #2118 (comment) should be enough.

Copy link
Copy Markdown
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one quick question, perhaps need input from core dev.

@grlee77
Copy link
Copy Markdown
Member

grlee77 commented Jun 11, 2019

just a quick ping to see if this PR can maybe get a review. It would be nice to have this functionality.

@andravin
Copy link
Copy Markdown
Contributor Author

Closing due to lack of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting CUDA function attributes

5 participants