-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Triggered by https://stackoverflow.com/questions/71035754/pyarrow-drop-a-column-in-a-nested-structure. I thought there was already an issue about this, but don't directly find one.
Assume you have a struct array with some fields:
>>> arr = pa.StructArray.from_arrays([[1, 2, 3]]*3, names=['a', 'b', 'c'])
>>> arr.type
StructType(struct<a: int64, b: int64, c: int64>)We have a kernel to select a single child field:
>>> pc.struct_field(arr, [0])
<pyarrow.lib.Int64Array object at 0x7ffa9e229940>
[
1,
2,
3
]But if you want to subset the StructArray to some of its fields, resulting in a new StructArray, that's not possible with struct_field, and doing this manually is a bit cumbersome:
>>> fields = ['a', 'c']
>>> arrays = [arr.field(n) for n in fields]
>>> arr_subset = pa.StructArray.from_arrays(arrays, names=fields)
>>> arr_subset.type
StructType(struct<a: int64, c: int64>)(this is still OK, but if you had a ChunkedArray, it certainly gets annoying)
One option could be to expand the existing struct_field to allow selecting multiple fields (although that probably gets ambigous/confusing with how you currently select a recursively nested field -> [0, 1] currently means "first child, second subchild" and not "first and second child").
Or a new kernel like "struct_subset" or some other name.
This might also overlap with general projection functionality? (cc @westonpace)
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Dhruv Vats / @dhruv9vats
Related issues:
- [C++] Implement casts from one struct type to another (with same field names and number of fields) (relates to)
- [C++] Improve MakeArrayOfNull to support creation of multiple arrays (relates to)
- [C++] Add basic support for nested field refs in scanning (relates to)
- [C++] Allow reordering fields of a StructArray via casting (is related to)
PRs and other links:
Note: This issue was originally created as ARROW-15643. Please see the migration documentation for further details.