Skip to content

flight_data_from_arrow_batch sends too much data #208

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12265

Arrow arrays can share the same backing store, even if the array is just a "view" of a slice of another array.

Yet, when flight_data_from_arrow_batch encodes the arrays into a FlightData, it blindly copies the entire buffer ready to be sent over the wire.

Thus, for example, when DataFusion uses the arrow::compute::limit operator to return a few elements of an array, we still end up with a the full (potentially) large array being sent over the wire.

 

Since encoding the array in a FlightData involves copying the data anyway, perhaps it would be beneficial to take the Array length in consideration and copy only the parts of the buffer that contain actual data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrow-flightChanges to the arrow-flight crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions