Skip to content

[BUG] 0 length string columns fail to deserialize #3481

@quasiben

Description

@quasiben

When using an rmm.DeviceBuffer (as dask does):

https://github.com/dask/distributed/blob/master/distributed/comm/ucx.py#L50

deserialization fails when the buffer is a string of size 0:

Bug Report
distributed.utils - ERROR - 'rmm._lib.device_buffer.DeviceBuffer' object has no attribute 'device_ctypes_pointer'
Traceback (most recent call last):
  File "/datasets/bzaitlen/GitRepos/distributed/distributed/utils.py", line 663, in log_errors
    yield
  File "/datasets/bzaitlen/GitRepos/cudf/python/cudf/cudf/comm/serialize.py", line 38, in deserialize_cudf_dataframe
    cudf_obj = cudf_typ.deserialize(header, frames)
  File "/datasets/bzaitlen/GitRepos/cudf/python/cudf/cudf/core/dataframe.py", line 274, in deserialize
    columns = column.deserialize_columns(header["columns"], column_frames)
  File "/datasets/bzaitlen/GitRepos/cudf/python/cudf/cudf/core/column/column.py", line 1553, in deserialize_columns
    colobj = col_typ.deserialize(meta, frames[:col_frame_count])
  File "/datasets/bzaitlen/GitRepos/cudf/python/cudf/cudf/core/column/string.py", line 661, in deserialize
    arrays.append(libcudf.cudf.get_ctype_ptr(frame))
  File "cudf/_lib/cudf.pyx", line 104, in cudf._lib.cudf.get_ctype_ptr
  File "cudf/_lib/cudf.pyx", line 105, in cudf._lib.cudf.get_ctype_ptr
AttributeError: 'rmm._lib.device_buffer.DeviceBuffer' object has no attribute 'device_ctypes_pointer'

Probably all is needed is a check on length in the deserialization scheme:

def deserialize(cls, header, frames):
# Deserialize the mask, value, and offset frames
arrays = []
for i, frame in enumerate(frames):
if isinstance(frame, memoryview):
sheader = header["subheaders"][i]
dtype = sheader["dtype"]
frame = np.frombuffer(frame, dtype=dtype)
frame = cudautils.to_device(frame)
arrays.append(libcudf.cudf.get_ctype_ptr(frame))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs TriageNeed team to review and classifybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions