Allow copying in the format `cupy_array[:] = numpy_array` by pentschev · Pull Request #2079 · cupy/cupy

pentschev · 2019-03-03T21:20:07Z

The setitem implementation from cupy.ndarray checks for an empty slice and
if the value being passed is an instance of numpy.ndarray to make a copy of it.
That can is a very useful feature in circumstances where we want to create a
copy of a numpy.ndarray without requiring mechanisms to identify the type of an
array as a CuPy array.

This resolves #593.

The __setitem__ implementation from cupy.ndarray checks for an empty slice and if the value being passed is an instance of numpy.ndarray to make a copy of it. That can is a very useful feature in circumstances where we want to create a copy of a numpy.ndarray without requiring mechanisms to identify the type of an array as a CuPy array.

niboshi · 2019-03-04T03:22:26Z

@okuta How do you think?

This change has a few benefits compared to the original proposal: * Its format is compatible with NumPy's copy of empty sliced arrays, as formats must now match * Avoid accidental overwriting of cupy_array when formats don't match * The current CUDA stream may be used implicitly This change implies on cupy_array being created beforehand. Eventually, we can benefit from numpy/numpy#13046 and this change combined.

pentschev · 2019-03-04T09:34:34Z

After thinking about this PR overnight, I realized there's a few improvements that could be made to make this change safer and better. Please take a look at the latest commits as well.

Requirements: * numpy/numpy#13046 * cupy/cupy#2079

pentschev · 2019-03-04T14:13:30Z

It may be helpful also to see what we envision a use case for such a change is. Please refer to dask/dask-glm@e39aafd#diff-6fa4900cefa9ee15a64539f9fcbf1639R182.

Assume step is a NumPy array and X is a CuPy array. We then create a CuPy array step_like with np.empty_like(X, shape=step.shape). Finally, we copy the contents via the empty slice assignment step_like[:] = step. In this situation, we guarantee an agnostic implementation that will convert step to a CuPy array from a NumPy array resulting from an algorithm that has no CuPy implementation.

Note that for this particular case, we need np.empty_like() to support passing an arbitrary shape. This is waiting numpy/numpy#13046 to be merged, and after it gets merged, I have already a PR prepared to add the shape argument to CuPy as well.

…opy-numpy-instance

pentschev · 2019-04-24T10:29:37Z

After a conversation with @anaruse, he said one of the concerns here was due to broadcasting. From our understanding, the issue arises when the NumPy array has a different stride, when it's a view, for example. The latest commits address that matter.

@niboshi could you confirm we understood your concerns correctly and if there are any others that we should address?

niboshi · 2019-05-07T06:35:13Z

Currently we intentionally disallow implicit host-to-device synchronization like this.
@okuta @asi1024 Do you have any comments?

pentschev · 2019-05-07T06:46:29Z

Just extending a bit on the original reason why we want to have this special case: with the __array_function__ protocol, it's important to have a way to permit such copies implicitly for when there different types of arrays (e.g., a NumPy and a CuPy array) coming from different sources, allowing us to target the correct implementation. Also note that this change only does not allow implicit device-to-host copies, to prevent an accidental fallback to CPU, but only host-to-device.

I understand the implications this change has, so I'd be fine if we could have this in 6.0.0 as opt-in-only, through a mechanism such as an environment variable, just as NumPy does current for __array_function__ itself.

niboshi · 2019-05-08T03:13:19Z

Disucssed with @asi1024 .

In this particular case (dask/dask-glm@e39aafd#diff-6fa4900cefa9ee15a64539f9fcbf1639R182), it seems that after #2165 is merged, the code should work because cupy.linalg.lstsq will return cupy.ndarray as step. Even without #2165, we're considering adding "NumPy-fallback" feature that should make the code work in this case. In this feature, even non-implemented functions (cupy.linalg.lstsq) should work by accepting and returning cupy.ndarrays and falling back to numpy's corresponding functions.

As for mixed-array-type case you mentioned above, could you elaborate why you need such cases supported?

pentschev · 2019-05-08T07:54:09Z

Thanks @niboshi and @asi1024 for checking on this, and for letting me know about #2165. This would certainly solve that particular case from dask-glm.

For the mixed arrays, consider a slightly different example in the same dask-glm PR, dask/dask-glm@e39aafd#diff-6fa4900cefa9ee15a64539f9fcbf1639R240. This function receives a beta which is the output of some fortran code in scipy, which isn't easy to be ported either to pure NumPy nor CuPy, so we're bound to get this output as a NumPy array. However, X could be, depending on the user input, either a NumPy array, or a CuPy array. It's this last case that we're interested in, we want to have beta copied to a CuPy array, so the remaining computation can be done in the GPU.

I hope this clarifies a bit the sort of use case we have for this.

As for the "NumPy-fallback" feature you mentioned, would it work to wrap a third-party functions (like SciPy) as well? For example, could we pass CuPy arrays to a pure-CPU implementation in SciPy and ensure we get back a CuPy array? I can't think of a way to do that, but if we could, that would be great and probably solve this as well.

niboshi · 2019-05-08T09:30:34Z

For the mixed arrays, consider a slightly different example in the same dask-glm PR, dask/dask-glm@e39aafd#diff-6fa4900cefa9ee15a64539f9fcbf1639R240. This function receives a beta which is the output of some fortran code in scipy, which isn't easy to be ported either to pure NumPy nor CuPy, so we're bound to get this output as a NumPy array. However, X could be, depending on the user input, either a NumPy array, or a CuPy array. It's this last case that we're interested in, we want to have beta copied to a CuPy array, so the remaining computation can be done in the GPU.

In this case, as you know which argument may be of differerent types, I think you can write np.asarray(beta) before passing to func (I assume np may be numpy or cupy). Do you think that works?

As for the "NumPy-fallback" feature you mentioned, would it work to wrap a third-party functions (like SciPy) as well? For example, could we pass CuPy arrays to a pure-CPU implementation in SciPy and ensure we get back a CuPy array? I can't think of a way to do that, but if we could, that would be great and probably solve this as well.

I don't think such cases will be supported either (@asi1024 How do you think?).
Even so, I think np.asarray method above can be used in any case.

pentschev · 2019-05-08T09:36:56Z

In this case, as you know which argument may be of differerent types, I think you can write np.asarray(beta) before passing to func (I assume np may be numpy or cupy). Do you think that works?

I don't think that works, we would have to assume that we know beta needs to be a CuPy array, but that isn't always true, as it depends on the type of X array.

pentschev · 2019-05-08T09:39:48Z

Maybe my last comment wasn't very clear, when I say beta needs to be a CuPy array, I mean that beta needs to be converted to a CuPy array, depending on whether X is also a CuPy array.

niboshi · 2019-05-08T10:01:03Z

My assumption was np may be either numpy or cupy that matches the type of X.

If this assumption is incorrect and np is always numpy, you can use xp = cupy.get_array_module(X) and xp.asarray.

pentschev · 2019-05-08T10:12:49Z

My assumption was np may be either numpy or cupy that matches the type of X.

If this assumption is incorrect and np is always numpy, you can use xp = cupy.get_array_module(X) and xp.asarray.

Sure, this would work. But the point of having __array_function__ in place is to avoid the need to handle explicitly other libraries (and even to let internals know about their existence), and rely solely on NumPy to do the whole work, no matter what the array type.

We understand the potential harm for allowing implicit copies, so having this special copy cupy_array[:] = numpy_array would allow libraries such as dask-glm to be agnostic of CuPy and at the same time prevent unnecessary copies elsewhere, other than a few places where this is absolutely necessary, since during implementation we know of this restriction when using SciPy, or any other CPU-only library.

niboshi · 2019-05-08T11:15:48Z

Ah, OK, I missed the existence of other libraries.

pentschev · 2019-05-08T11:35:36Z

No worries, there are indeed many details and different use cases involved, it's not too difficult to get confused. :)

Please let me know what you guys discuss further, once again, we're open to different solutions as well, provided that we can keep the main benefits of __array_function__.

niboshi · 2019-05-08T12:18:39Z

I'm sorry, maybe it's not clear yet.

My assumption was np may be either numpy or cupy that matches the type of X.

If this assumption is incorrect and np is always numpy, you can use xp = >cupy.get_array_module(X) and xp.asarray.

Sure, this would work. But the point of having __array_function__ in place is to avoid the need to handle explicitly other libraries (and even to let internals know about their existence), and rely solely on NumPy to do the whole work, no matter what the array type.

So, np is actually numpy or cupy (or similar thing in other libraries). Is that correct? Otherwise np.empty_like wouldn't create a CuPy ndarray.
Then, what's the problem about writing np.asarray(beta)? Because you already have np, I think writing that does not require additional knowledge about external libraries like CuPy or others.

niboshi

Thank you very much.
LGTM except this comment.

niboshi · 2019-05-23T02:22:44Z

cupy/core/core.pyx

+            else:
+                raise ValueError(
+                    "copying a numpy.ndarray to a cupy.ndarray by empty slice "
+                    "assignment must ensure arrays exact same shape and dtype")


Please use single quotes for consistency.

Thanks for noticing that, thought that flake8 would catch this, but I realized it doesn't support that, at least without third-party plugins. Just pushed a fix for this.

niboshi · 2019-05-23T07:39:27Z

Thank you for the fix.
Jenkins, test this please.

pfn-ci-bot · 2019-05-23T07:39:30Z

Successfully created a job for commit 0e329c7:

Dashboard for commit 0e329c7

chainer-ci · 2019-05-23T08:02:34Z

Jenkins CI test (for commit 0e329c7, target branch master) failed with status FAILURE.

pentschev · 2019-05-23T08:06:58Z

Looks like the failure is in chainer-doc, probably not related to this PR.

niboshi · 2019-05-23T09:03:04Z

Looks so. Retrying.
Jenkins, test this please

pfn-ci-bot · 2019-05-23T09:03:08Z

Successfully created a job for commit 0e329c7:

Dashboard for commit 0e329c7

chainer-ci · 2019-05-23T09:49:13Z

Jenkins CI test (for commit 0e329c7, target branch master) succeeded!

niboshi · 2019-05-23T09:52:37Z

Thanks. LGTM!

pentschev · 2019-05-23T09:55:34Z

Awesome! Thanks @niboshi for the review and merging!

pentschev · 2019-05-23T09:58:09Z

By the way, since this now requires explicit opt-in, do you think it could be backported to the next CuPy 6 release?

…y-instance Allow copying in the format `cupy_array[:] = numpy_array`

niboshi · 2019-05-27T14:37:10Z

Yes, sure. Thank you for pointing out!

pentschev added 2 commits March 3, 2019 22:14

Fix flake8 error

63eb25f

toslunar assigned niboshi Mar 4, 2019

pentschev added 2 commits March 4, 2019 10:25

Added tests for cupy_array[:] = numpy_array style copy

e947ada

pentschev added a commit to pentschev/dask-glm that referenced this pull request Mar 4, 2019

Adjust all algorithms to work with CuPy

e39aafd

Requirements: * numpy/numpy#13046 * cupy/cupy#2079

pentschev mentioned this pull request Mar 4, 2019

[WIP] Adjust all algorithms to work with CuPy dask/dask-glm#75

Open

Merge remote-tracking branch 'upstream/master' into ndarray-setitem-c…

0e19d04

…opy-numpy-instance

niboshi added the st:needs-discussion label Apr 2, 2019

pentschev added 2 commits April 24, 2019 12:24

Permit cupy_array[:] = numpy_array copies with different strides

f32e91f

Add test for cupy_array[:] = numpy_array when the latter is a view

1b7dbab

Fix autopep8 error

dc80293

pentschev mentioned this pull request Apr 24, 2019

NEP-18 Issue Tracking dask/dask#4731

Closed

19 tasks

Avoid temp buffer if strides match in cupy_array[:] = numpy_array copy

fa8ecdc

Fix flake8 error

95d79d9

niboshi requested changes May 23, 2019

View reviewed changes

Fix empty slice copy comment and style

0e329c7

niboshi approved these changes May 23, 2019

View reviewed changes

niboshi added cat:feature New features/APIs and removed st:needs-discussion labels May 23, 2019

niboshi added this to the v7.0.0b1 milestone May 23, 2019

niboshi changed the title ~~Allow copying in the format cupy_array[:] = numpy_array~~ Allow copying in the format cupy_array[:] = numpy_array May 23, 2019

niboshi merged commit 24b8445 into cupy:master May 23, 2019

niboshi added the to-be-backported Pull-requests to be backported to stable branch label May 27, 2019

niboshi added a commit to niboshi/cupy that referenced this pull request May 27, 2019

Merge pull request cupy#2079 from pentschev/ndarray-setitem-copy-nump…

346cb26

…y-instance Allow copying in the format `cupy_array[:] = numpy_array`

niboshi mentioned this pull request May 27, 2019

[backport] Allow copying in the format cupy_array[:] = numpy_array #2219

Merged

This was referenced Jul 16, 2019

Use CUB to speed up sum/min/max #2090

Merged

Supporting duck array coercion numpy/numpy#13831

Open

pentschev deleted the ndarray-setitem-copy-numpy-instance branch August 5, 2019 12:43

kalvdans mentioned this pull request Sep 3, 2020

Handle transfer to cupy view #3928

Merged

leofang mentioned this pull request Mar 19, 2021

Add APIs for creating NumPy arrays backed by pinned memory #4870

Merged

4 tasks

leofang mentioned this pull request Mar 21, 2024

cupy copyto disallows numpy arrays #8165

Open

kmaehashi mentioned this pull request Apr 7, 2025

Can't use cupy.ndarray's __setitem__ when the value is an array-like object that implements __cuda_array_interface__ or __cupy_get_ndarray__ but also implements __eq__. #9089

Closed

Uh oh!

Conversation

pentschev commented Mar 3, 2019

Uh oh!

niboshi commented Mar 4, 2019

Uh oh!

pentschev commented Mar 4, 2019

Uh oh!

pentschev commented Mar 4, 2019

Uh oh!

pentschev commented Apr 24, 2019

Uh oh!

niboshi commented May 7, 2019

Uh oh!

pentschev commented May 7, 2019

Uh oh!

niboshi commented May 8, 2019

Uh oh!

pentschev commented May 8, 2019

Uh oh!

niboshi commented May 8, 2019

Uh oh!

pentschev commented May 8, 2019

Uh oh!

pentschev commented May 8, 2019

Uh oh!

niboshi commented May 8, 2019

Uh oh!

pentschev commented May 8, 2019

Uh oh!

niboshi commented May 8, 2019

Uh oh!

pentschev commented May 8, 2019

Uh oh!

niboshi commented May 8, 2019

Uh oh!

niboshi left a comment

Choose a reason for hiding this comment

Uh oh!

niboshi May 23, 2019

Choose a reason for hiding this comment

Uh oh!

pentschev May 23, 2019

Choose a reason for hiding this comment

Uh oh!

niboshi May 23, 2019

Choose a reason for hiding this comment

Uh oh!

niboshi commented May 23, 2019

Uh oh!

pfn-ci-bot commented May 23, 2019

Uh oh!

chainer-ci commented May 23, 2019

Uh oh!

pentschev commented May 23, 2019

Uh oh!

niboshi commented May 23, 2019

Uh oh!

pfn-ci-bot commented May 23, 2019

Uh oh!

chainer-ci commented May 23, 2019

Uh oh!

niboshi commented May 23, 2019

Uh oh!

pentschev commented May 23, 2019

Uh oh!

pentschev commented May 23, 2019

Uh oh!

niboshi commented May 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants