Skip to content
This repository was archived by the owner on Sep 18, 2025. It is now read-only.

WIP: Make shape a pointer and add ndim + strides#121

Closed
jakirkham wants to merge 3 commits intorapidsai:develfrom
jakirkham:hdl_shape_buf_protocol
Closed

WIP: Make shape a pointer and add ndim + strides#121
jakirkham wants to merge 3 commits intorapidsai:develfrom
jakirkham:hdl_shape_buf_protocol

Conversation

@jakirkham
Copy link
Copy Markdown
Member

Adds _shape and _strides as private pointers that contain the shape and strides of the data. Also adds ndim to aid tracking the dimensionality as a publicly visible value. Includes properties to view the shape and strides.

Note: Still need to handle some code that is assigning to self.shape as that is no longer valid with this approach.

@mrocklin
Copy link
Copy Markdown
Collaborator

mrocklin commented Jun 7, 2019

Right, I get this error

Traceback (most recent call last):
  File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/nfs/mrocklin/distributed/distributed/nanny.py", line 354, in _on_exit
    yield self.scheduler.unregister(address=self.worker_address)
  File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/nfs/mrocklin/distributed/distributed/core.py", line 741, in send_recv_from_rpc
    result = yield send_recv(comm=comm, op=key, **kwargs)
  File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/nfs/mrocklin/distributed/distributed/core.py", line 533, in send_recv
    yield comm.write(msg, serializers=serializers, on_error="raise")
  File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/nfs/mrocklin/distributed/distributed/comm/ucx.py", line 113, in write
    await self.ep.send_obj(meta)
  File "ucp/_libs/ucp_py.pyx", line 370, in ucp_py.Endpoint.send_obj
  File "ucp/_libs/ucp_py.pyx", line 348, in ucp_py.Endpoint._send_obj_host
  File "ucp/_libs/ucp_py_buffer_helper.pyx", line 204, in ucp_py.BufferRegion.populate_ptr
  File "ucp/_libs/ucp_py_buffer_helper.pyx", line 207, in ucp_py.BufferRegion._populate_ptr
AttributeError: attribute 'shape' of 'ucp_py.BufferRegion' objects is not writable
ERROR    asyncio:base_events.py:1608 Future exception was never retrieved
future: <Future finished exception=AttributeError("attribute 'shape' of 'ucp_py.BufferRegion' objects is not writable")>

when I run this test (on latest distributed master)

py.test distributed/comm/tests/test_ucx.py::test_stress -s --runslow

@jakirkham
Copy link
Copy Markdown
Member Author

jakirkham commented Jun 7, 2019

Yep, that's expected currently.

One of the things I'm trying to wrap my head around ATM is what kinds of buffers are we trying to handle here?

For instance in master ndim = 1. Is the general idea to only support 1-D buffers in this class? If so, the code can probably be simplified a lot and this assignment issue dealt with much more simply.

Also are we expecting buffers to be contiguous or may they be non-contiguous? Currently the code to create the buffer suggests it should be contiguous, but that is not enforced here.

@jakirkham
Copy link
Copy Markdown
Member Author

Perhaps these are questions for @TomAugspurger or @Akshay-Venkatesh?

@TomAugspurger
Copy link
Copy Markdown
Contributor

TomAugspurger commented Jun 7, 2019 via email

@mrocklin
Copy link
Copy Markdown
Collaborator

mrocklin commented Jun 7, 2019

The fact that my PR fixes a failure implies that that might not be true.

@jakirkham
Copy link
Copy Markdown
Member Author

Makes sense. I'm happy to keep pushing on this then. Just want to make sure I'm not overcomplicating things.

If anyone has thoughts on whether data is required to be contiguous or not, that would be good to know.

@mrocklin
Copy link
Copy Markdown
Collaborator

mrocklin commented Jun 7, 2019

I don't know what the implementation does. I recall conversations where we said "lets not bother with non-contiguous data. If it's non-contiguous, let's just make a copy". I can't authoritatively say that that's what happened though.

@TomAugspurger
Copy link
Copy Markdown
Contributor

TomAugspurger commented Jun 7, 2019 via email

@jakirkham
Copy link
Copy Markdown
Member Author

Is there a way to try sending a NumPy or CuPy array through UCX and seeing what you get out? I think this would be really informative as we could quickly see how these different cases are handled and get several test cases to include as a bonus.

@Akshay-Venkatesh
Copy link
Copy Markdown
Contributor

@jakirkham This test written by @TomAugspurger is probably the easiest way to check what is being received.

@mrocklin
Copy link
Copy Markdown
Collaborator

When I try this with the ucx stress test I get the following error:

  File "/home/nfs/mrocklin/distributed/distributed/worker.py", line 742, in _register_with_scheduler
    serializers=["msgpack"],
  File "/home/nfs/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/nfs/mrocklin/distributed/distributed/comm/ucx.py", line 113, in write
    await self.ep.send_obj(meta)
  File "ucp/_libs/ucp_py.pyx", line 370, in ucp_py.Endpoint.send_obj
  File "ucp/_libs/ucp_py.pyx", line 348, in ucp_py.Endpoint._send_obj_host
  File "ucp/_libs/ucp_py_buffer_helper.pyx", line 211, in ucp_py.BufferRegion.populate_ptr
  File "ucp/_libs/ucp_py_buffer_helper.pyx", line 214, in ucp_py.BufferRegion._populate_ptr
AttributeError: attribute 'shape' of 'ucp_py.BufferRegion' objects is not writable

I think that the proper thing to do here is to try this with the following test:

py.test distributed/comm/tests/test_ucx.py::test_stress --runslow

@jakirkham
Copy link
Copy Markdown
Member Author

@mrocklin, I haven't been doing anything on this since we last discussed it. The commit pushed above included some changes locally that needed to be moved out of my local copy before doing other work. I doubt they (or this PR more generally) are complete as the commit message notes. Happy to take a look at this again if you would like. 🙂

@jakirkham
Copy link
Copy Markdown
Member Author

Independently we don't seem to be handling buffer protocol requests correctly. Have raised issue ( #127 ) to discuss that.

@mrocklin
Copy link
Copy Markdown
Collaborator

@madsbk could I ask you to take a look at this PR and #120 ?

Copy link
Copy Markdown
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good. I have added some suggestions regarding support of multi-dimension memoryviews

self._shape[i] = 0
self._strides = <Py_ssize_t*>malloc(sizeof(Py_ssize_t) * self.ndim)
for i in range(self.ndim):
self._strides[i] = 1
Copy link
Copy Markdown
Member

@madsbk madsbk Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support multi-dimension memoryviews, you could do something like:

    s = 1
    for i in reversed(range(self.ndim)):
        self._strides[i] = s
        s *= self._strides[i]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are BufferRegion.strides() supposed to be in elements or bytes?

return 0
else:
return self.shape[0]
return self._shape[0]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

        size = 1
        for i in range(self.ndim):
            size *= self._shape[i]

strides[0] = <Py_ssize_t>self.itemsize
assert len(self.shape)
if self.shape[0] == 0:
if self._shape[0] == 0:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

if len(self) == 0:

buffer.itemsize = self.itemsize
buffer.len = self.shape[0] * self.itemsize
buffer.ndim = 1
buffer.len = self._shape[0] * self.itemsize
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

buffer.len = self.nbytes()

@jakirkham
Copy link
Copy Markdown
Member Author

Feel free to push changes directly to this PR @madsbk if you are comfortable. 🙂

@madsbk
Copy link
Copy Markdown
Member

madsbk commented Aug 15, 2019

OK but I just realized that populate_ptr() and populate_cuda_ptr() also needs to support ndim changes.
@jakirkham, is it alright with you if I create a new PR that rewrites BufferRegion into a more clean implementation that incorporate your ideas?

@jakirkham
Copy link
Copy Markdown
Member Author

Please go ahead, @madsbk. 🙂

@jakirkham
Copy link
Copy Markdown
Member Author

I think this was basically addressed by @madsbk's PR ( #144 ). So closing it out. Though feel free to pull things from this if it winds up still being useful for some reason.

@jakirkham jakirkham closed this Aug 19, 2019
@jakirkham jakirkham deleted the hdl_shape_buf_protocol branch September 10, 2019 18:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants