WIP: Make shape a pointer and add ndim + strides#121
WIP: Make shape a pointer and add ndim + strides#121jakirkham wants to merge 3 commits intorapidsai:develfrom
Conversation
|
Right, I get this error Traceback (most recent call last):
File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/nfs/mrocklin/distributed/distributed/nanny.py", line 354, in _on_exit
yield self.scheduler.unregister(address=self.worker_address)
File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
value = future.result()
File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/nfs/mrocklin/distributed/distributed/core.py", line 741, in send_recv_from_rpc
result = yield send_recv(comm=comm, op=key, **kwargs)
File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
value = future.result()
File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/nfs/mrocklin/distributed/distributed/core.py", line 533, in send_recv
yield comm.write(msg, serializers=serializers, on_error="raise")
File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
value = future.result()
File "/home/nfs/mrocklin/distributed/distributed/comm/ucx.py", line 113, in write
await self.ep.send_obj(meta)
File "ucp/_libs/ucp_py.pyx", line 370, in ucp_py.Endpoint.send_obj
File "ucp/_libs/ucp_py.pyx", line 348, in ucp_py.Endpoint._send_obj_host
File "ucp/_libs/ucp_py_buffer_helper.pyx", line 204, in ucp_py.BufferRegion.populate_ptr
File "ucp/_libs/ucp_py_buffer_helper.pyx", line 207, in ucp_py.BufferRegion._populate_ptr
AttributeError: attribute 'shape' of 'ucp_py.BufferRegion' objects is not writable
ERROR asyncio:base_events.py:1608 Future exception was never retrieved
future: <Future finished exception=AttributeError("attribute 'shape' of 'ucp_py.BufferRegion' objects is not writable")>when I run this test (on latest distributed master) |
|
Yep, that's expected currently. One of the things I'm trying to wrap my head around ATM is what kinds of buffers are we trying to handle here? For instance in Also are we expecting buffers to be contiguous or may they be non-contiguous? Currently the code to create the buffer suggests it should be contiguous, but that is not enforced here. |
|
Perhaps these are questions for @TomAugspurger or @Akshay-Venkatesh? |
|
I thought that everything was 1D by this point, but that may have been
mistaken?
…On Fri, Jun 7, 2019 at 3:53 PM jakirkham ***@***.***> wrote:
Perhaps these are questions for @TomAugspurger
<https://github.com/TomAugspurger> or @Akshay-Venkatesh
<https://github.com/Akshay-Venkatesh>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#121?email_source=notifications&email_token=AAKAOIXP6BKUJXSJPG3NAYDPZLDFZA5CNFSM4HV2DAKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXG65OQ#issuecomment-500035258>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIVYX4I7NFYUPD6SJKTPZLDFZANCNFSM4HV2DAKA>
.
|
|
The fact that my PR fixes a failure implies that that might not be true. |
|
Makes sense. I'm happy to keep pushing on this then. Just want to make sure I'm not overcomplicating things. If anyone has thoughts on whether data is required to be contiguous or not, that would be good to know. |
|
I don't know what the implementation does. I recall conversations where we said "lets not bother with non-contiguous data. If it's non-contiguous, let's just make a copy". I can't authoritatively say that that's what happened though. |
|
That's my recollection as well.
…On Fri, Jun 7, 2019 at 4:35 PM Matthew Rocklin ***@***.***> wrote:
I don't know what the implementation does. I recall conversations where we
said "lets not bother with non-contiguous data. If it's non-contiguous,
let's just make a copy". I can't authoritatively say that that's what
happened though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#121?email_source=notifications&email_token=AAKAOISNYQ5JV5CYXSPY3M3PZLIDRA5CNFSM4HV2DAKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXHBY5A#issuecomment-500046964>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIQUZLLXOUULVUPZ7RLPZLIDRANCNFSM4HV2DAKA>
.
|
|
Is there a way to try sending a NumPy or CuPy array through UCX and seeing what you get out? I think this would be really informative as we could quickly see how these different cases are handled and get several test cases to include as a bonus. |
|
@jakirkham This test written by @TomAugspurger is probably the easiest way to check what is being received. |
|
When I try this with the ucx stress test I get the following error: File "/home/nfs/mrocklin/distributed/distributed/worker.py", line 742, in _register_with_scheduler
serializers=["msgpack"],
File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
value = future.result()
File "/home/nfs/mrocklin/distributed/distributed/comm/ucx.py", line 113, in write
await self.ep.send_obj(meta)
File "ucp/_libs/ucp_py.pyx", line 370, in ucp_py.Endpoint.send_obj
File "ucp/_libs/ucp_py.pyx", line 348, in ucp_py.Endpoint._send_obj_host
File "ucp/_libs/ucp_py_buffer_helper.pyx", line 211, in ucp_py.BufferRegion.populate_ptr
File "ucp/_libs/ucp_py_buffer_helper.pyx", line 214, in ucp_py.BufferRegion._populate_ptr
AttributeError: attribute 'shape' of 'ucp_py.BufferRegion' objects is not writableI think that the proper thing to do here is to try this with the following test: |
|
@mrocklin, I haven't been doing anything on this since we last discussed it. The commit pushed above included some changes locally that needed to be moved out of my local copy before doing other work. I doubt they (or this PR more generally) are complete as the commit message notes. Happy to take a look at this again if you would like. 🙂 |
|
Independently we don't seem to be handling buffer protocol requests correctly. Have raised issue ( #127 ) to discuss that. |
madsbk
left a comment
There was a problem hiding this comment.
I think it looks good. I have added some suggestions regarding support of multi-dimension memoryviews
| self._shape[i] = 0 | ||
| self._strides = <Py_ssize_t*>malloc(sizeof(Py_ssize_t) * self.ndim) | ||
| for i in range(self.ndim): | ||
| self._strides[i] = 1 |
There was a problem hiding this comment.
To support multi-dimension memoryviews, you could do something like:
s = 1
for i in reversed(range(self.ndim)):
self._strides[i] = s
s *= self._strides[i]There was a problem hiding this comment.
Are BufferRegion.strides() supposed to be in elements or bytes?
| return 0 | ||
| else: | ||
| return self.shape[0] | ||
| return self._shape[0] |
There was a problem hiding this comment.
Ditto
size = 1
for i in range(self.ndim):
size *= self._shape[i]| strides[0] = <Py_ssize_t>self.itemsize | ||
| assert len(self.shape) | ||
| if self.shape[0] == 0: | ||
| if self._shape[0] == 0: |
| buffer.itemsize = self.itemsize | ||
| buffer.len = self.shape[0] * self.itemsize | ||
| buffer.ndim = 1 | ||
| buffer.len = self._shape[0] * self.itemsize |
There was a problem hiding this comment.
Ditto
buffer.len = self.nbytes()|
Feel free to push changes directly to this PR @madsbk if you are comfortable. 🙂 |
|
OK but I just realized that |
|
Please go ahead, @madsbk. 🙂 |
Adds
_shapeand_stridesas private pointers that contain the shape and strides of the data. Also addsndimto aid tracking the dimensionality as a publicly visible value. Includes properties to view theshapeandstrides.Note: Still need to handle some code that is assigning to
self.shapeas that is no longer valid with this approach.