-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
reproducer below
import numpy as np
import pyarrow as pa
arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)])
nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
big_arr = arr.take(indices)
print(big_arr.offsets[-5:])
big_arr.validate() # hopefully this can catch it
[
-21,
-16,
-11,
-6,
-1
]
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-1-09503f9cbb04> in <module>
6 big_arr = arr.take(indices)
7 print(big_arr.offsets[-5:])
----> 8 big_arr.validate()
/opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.validate()
/opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: Negative offsets in list arrayand it works fine with large_array (as expected) :
import numpy as np
import pyarrow as pa
arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)], type=pa.large_list(pa.int8()))
nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
big_arr = arr.take(indices)
print(big_arr.offsets[-5:])
big_arr.validate()
[
4294967275,
4294967280,
4294967285,
4294967290,
4294967295
]Reporter: Artem KOZHEVNIKOV / @artemru
Related issues:
- [Python] pyarrow.concat_arrays segfaults if a resulting StringArray's capacity overflows (is related to)
Note: This issue was originally created as ARROW-10494. Please see the migration documentation for further details.