Skip to content

Saving DataStore leads to crash when component columns exceed 1B elements #3097

@jleibs

Description

@jleibs

Describe the bug
When trying to write an rrd or blueprint to disk it is possible to end up in a state where we encounter an error such as:

Err value: Overflow', arrow2-0.17.1/src/array/growable/binary.rs:73

See: #3010

This can happen with anything that uses offset arrays, so either ListArray or BinaryArrays, though BinaryArrays are the most likely as it can be seen with only 1GB of data.

The root cause of the problem is that when we try to save from the viewer we concatenate all of the data-cell slices for a given component. The way in which the slice overflow check is done (https://github.com/jorgecarleitao/arrow2/blob/main/src/offset.rs#L288) considers the worst-case concatenation of the slice, even if the sliced region is significantly smaller and would not result in an actual overflow.

Additionally, because Arrow uses i32 for offsets, we overflow at 2B and so any 2 slices from source-arrays with 1B elements will overflow. We probably want to consider using LargeBinaryArrays (i64 offsets) to future proof against this. Alternatively we could guard against this by (1) only compacting down to 2B element chunks and (2) never re-compacting from large slices.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ⛃ re_datastoreaffects the datastore itself🪳 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions