Describe the bug
When trying to write an rrd or blueprint to disk it is possible to end up in a state where we encounter an error such as:
Err value: Overflow', arrow2-0.17.1/src/array/growable/binary.rs:73
See: #3010
This can happen with anything that uses offset arrays, so either ListArray or BinaryArrays, though BinaryArrays are the most likely as it can be seen with only 1GB of data.
The root cause of the problem is that when we try to save from the viewer we concatenate all of the data-cell slices for a given component. The way in which the slice overflow check is done (https://github.com/jorgecarleitao/arrow2/blob/main/src/offset.rs#L288) considers the worst-case concatenation of the slice, even if the sliced region is significantly smaller and would not result in an actual overflow.
Additionally, because Arrow uses i32 for offsets, we overflow at 2B and so any 2 slices from source-arrays with 1B elements will overflow. We probably want to consider using LargeBinaryArrays (i64 offsets) to future proof against this. Alternatively we could guard against this by (1) only compacting down to 2B element chunks and (2) never re-compacting from large slices.
Describe the bug
When trying to write an rrd or blueprint to disk it is possible to end up in a state where we encounter an error such as:
See: #3010
This can happen with anything that uses offset arrays, so either ListArray or BinaryArrays, though BinaryArrays are the most likely as it can be seen with only 1GB of data.
The root cause of the problem is that when we try to save from the viewer we concatenate all of the data-cell slices for a given component. The way in which the slice overflow check is done (https://github.com/jorgecarleitao/arrow2/blob/main/src/offset.rs#L288) considers the worst-case concatenation of the slice, even if the sliced region is significantly smaller and would not result in an actual overflow.
Additionally, because Arrow uses i32 for offsets, we overflow at 2B and so any 2 slices from source-arrays with 1B elements will overflow. We probably want to consider using LargeBinaryArrays (i64 offsets) to future proof against this. Alternatively we could guard against this by (1) only compacting down to 2B element chunks and (2) never re-compacting from large slices.