-
Notifications
You must be signed in to change notification settings - Fork 120
Closed
Labels
bugThe problem described is something that must be fixedThe problem described is something that must be fixed
Description
Version of Awkward Array
2.6.8
Description and code to reproduce
It seems RecordArray allows for duplicated fields, e.g. when constructing via the Layout API
>>> array = ak.Array(ak.contents.RecordArray([ak.contents.NumpyArray([1, 2, 3]), ak.contents.NumpyArray([1, 2, 3])], ["a", "a"]))
>>> array
<Array [{a: 1, a: 1}, {...}, {a: 3, a: 3}] type='3 * {a: int64, a: int64}'>Another possibility this can happen is if one (like me, accidentally) repeats a record field twice when selecting multiple record fields:
>>> array = ak.zip({"a": [1, 2, 3]})[["a", "a"]]
>>> array
<Array [{a: 1, a: 1}, {...}, {a: 3, a: 3}] type='3 * {a: int64, a: int64}'>Now, such arrays cause issues when
- exploding via
to_buffers:
>>> ak.to_buffers(array)
(RecordForm([NumpyForm('int64', form_key='node1'), NumpyForm('int64', form_key='node2')], ['a', 'a'], form_key='node0'), 3, {'node1-data': array([1, 2, 3])})one can see node2-data is missing
- which consequently leads to problems unpickling
>>> pickle.loads(pickle.dumps(array))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/_pickle.py", line 107, in unpickle_array_schema_1
return _impl(
^^^^^^
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/operations/ak_from_buffers.py", line 150, in _impl
out = _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/operations/ak_from_buffers.py", line 405, in _reconstitute
_reconstitute(
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/operations/ak_from_buffers.py", line 196, in _reconstitute
raw_array = container[getkey(form, "data")]
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'node2-data'- also a roundtrip to and from arrow doesn't work anymore:
>>> ak.from_arrow(ak.to_arrow(array))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/_dispatch.py", line 39, in dispatch
gen_or_result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/operations/ak_from_arrow.py", line 45, in from_arrow
return _impl(array, generate_bitmasks, highlevel, behavior, attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/operations/ak_from_arrow.py", line 55, in _impl
out = awkward._connect.pyarrow.handle_arrow(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/_connect/pyarrow/conversions.py", line 757, in handle_arrow
out = popbuffers(
^^^^^^^^^^^
File "/home/nikolai/.local/lib/python3.12/site-packages/awkward/_connect/pyarrow/conversions.py", line 370, in popbuffers
paarray.field(field_name), a, b, buffers, generate_bitmasks
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 3913, in pyarrow.lib.StructArray.field
KeyError: 'a'
This error occurred while calling
ak.from_arrow(
AwkwardArrowArray-instance
)Probably one could just not allow arrays with duplicated field names. I'm not sure if there is any useful application of this - when i discovered this in my code this was also actually something i did not intend to do (just accidentally repeated a field name), so writing this minimal reproducer was already worth it :)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugThe problem described is something that must be fixedThe problem described is something that must be fixed