Skip to content

[Python][C++] C Data Interface incorrect validate failures #14875

@zeroshade

Description

@zeroshade

Describe the bug, including details regarding any error messages, version, and platform.

Spinning off from #14814:

When testing round trips of empty arrays between Python and Go using the C Data Interface, I found an issue with the binary and string data type arrays.

The data types: pa.binary(), pa.large_binary(), pa.string(), pa.large_string() all throw an error when calling validate(full=True) after the _import_from_c that contained a null value data buffer:

Traceback (most recent call last):
  File "/home/zeroshade/Projects/GitHub/arrow/go/arrow/cdata/test/test_export_to_cgo.py", line 218, in test
    b.validate(full=True)
  File "pyarrow/array.pxi", line 1501, in pyarrow.lib.Array.validate
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Value data buffer is null

Following up from #14805 clarifying that buffers can be null in a 0-length array. My guess here is rather than the offsets buffer, the issue is the second data buffer which would contain the actual binary/utf-8 data if the array had a length >0. But that's just a theory, I haven't confirmed it.

Component(s)

C++, Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions