Allow copying in the format cupy_array[:] = numpy_array#2079
Allow copying in the format cupy_array[:] = numpy_array#2079niboshi merged 14 commits intocupy:masterfrom
cupy_array[:] = numpy_array#2079Conversation
The __setitem__ implementation from cupy.ndarray checks for an empty slice and if the value being passed is an instance of numpy.ndarray to make a copy of it. That can is a very useful feature in circumstances where we want to create a copy of a numpy.ndarray without requiring mechanisms to identify the type of an array as a CuPy array.
|
@okuta How do you think? |
This change has a few benefits compared to the original proposal: * Its format is compatible with NumPy's copy of empty sliced arrays, as formats must now match * Avoid accidental overwriting of cupy_array when formats don't match * The current CUDA stream may be used implicitly This change implies on cupy_array being created beforehand. Eventually, we can benefit from numpy/numpy#13046 and this change combined.
|
After thinking about this PR overnight, I realized there's a few improvements that could be made to make this change safer and better. Please take a look at the latest commits as well. |
Requirements: * numpy/numpy#13046 * cupy/cupy#2079
|
It may be helpful also to see what we envision a use case for such a change is. Please refer to dask/dask-glm@e39aafd#diff-6fa4900cefa9ee15a64539f9fcbf1639R182. Assume Note that for this particular case, we need |
…opy-numpy-instance
|
After a conversation with @anaruse, he said one of the concerns here was due to broadcasting. From our understanding, the issue arises when the NumPy array has a different stride, when it's a view, for example. The latest commits address that matter. @niboshi could you confirm we understood your concerns correctly and if there are any others that we should address? |
|
Just extending a bit on the original reason why we want to have this special case: with the I understand the implications this change has, so I'd be fine if we could have this in 6.0.0 as opt-in-only, through a mechanism such as an environment variable, just as NumPy does current for |
|
Disucssed with @asi1024 . In this particular case (dask/dask-glm@e39aafd#diff-6fa4900cefa9ee15a64539f9fcbf1639R182), it seems that after #2165 is merged, the code should work because As for mixed-array-type case you mentioned above, could you elaborate why you need such cases supported? |
|
Thanks @niboshi and @asi1024 for checking on this, and for letting me know about #2165. This would certainly solve that particular case from dask-glm. For the mixed arrays, consider a slightly different example in the same dask-glm PR, dask/dask-glm@e39aafd#diff-6fa4900cefa9ee15a64539f9fcbf1639R240. This function receives a I hope this clarifies a bit the sort of use case we have for this. As for the "NumPy-fallback" feature you mentioned, would it work to wrap a third-party functions (like SciPy) as well? For example, could we pass CuPy arrays to a pure-CPU implementation in SciPy and ensure we get back a CuPy array? I can't think of a way to do that, but if we could, that would be great and probably solve this as well. |
In this case, as you know which argument may be of differerent types, I think you can write
I don't think such cases will be supported either (@asi1024 How do you think?). |
I don't think that works, we would have to assume that we know |
|
Maybe my last comment wasn't very clear, when I say |
|
My assumption was If this assumption is incorrect and |
Sure, this would work. But the point of having We understand the potential harm for allowing implicit copies, so having this special copy |
|
Ah, OK, I missed the existence of other libraries. |
|
No worries, there are indeed many details and different use cases involved, it's not too difficult to get confused. :) Please let me know what you guys discuss further, once again, we're open to different solutions as well, provided that we can keep the main benefits of |
|
I'm sorry, maybe it's not clear yet.
So, |
niboshi
left a comment
There was a problem hiding this comment.
Thank you very much.
LGTM except this comment.
cupy/core/core.pyx
Outdated
| else: | ||
| raise ValueError( | ||
| "copying a numpy.ndarray to a cupy.ndarray by empty slice " | ||
| "assignment must ensure arrays exact same shape and dtype") |
There was a problem hiding this comment.
Please use single quotes for consistency.
There was a problem hiding this comment.
Thanks for noticing that, thought that flake8 would catch this, but I realized it doesn't support that, at least without third-party plugins. Just pushed a fix for this.
|
Thank you for the fix. |
|
Successfully created a job for commit 0e329c7: |
|
Jenkins CI test (for commit 0e329c7, target branch master) failed with status FAILURE. |
|
Looks like the failure is in chainer-doc, probably not related to this PR. |
|
Looks so. Retrying. |
|
Successfully created a job for commit 0e329c7: |
|
Jenkins CI test (for commit 0e329c7, target branch master) succeeded! |
|
Thanks. LGTM! |
cupy_array[:] = numpy_array
|
Awesome! Thanks @niboshi for the review and merging! |
|
By the way, since this now requires explicit opt-in, do you think it could be backported to the next CuPy 6 release? |
…y-instance Allow copying in the format `cupy_array[:] = numpy_array`
|
Yes, sure. Thank you for pointing out! |
The setitem implementation from cupy.ndarray checks for an empty slice and
if the value being passed is an instance of numpy.ndarray to make a copy of it.
That can is a very useful feature in circumstances where we want to create a
copy of a numpy.ndarray without requiring mechanisms to identify the type of an
array as a CuPy array.
This resolves #593.