GH-32276: [C++][FlightRPC] Add option to align RecordBatch buffers given to IPC reader#44279
Conversation
|
@pitrou do you think this fix is viable? |
lidavidm
left a comment
There was a problem hiding this comment.
Thanks for picking this up. This seems reasonable to me at a quick glance.
cpp/src/arrow/array/data.h
Outdated
There was a problem hiding this comment.
nit: put this include with the rest of the Arrow includes (and use quotes to be consistent)
cpp/src/arrow/array/data.h
Outdated
There was a problem hiding this comment.
nit: you could iterate with for (auto& child : child_data) and avoid the explicit index?
python/pyarrow/tests/test_ipc.py
Outdated
77cc70a to
a5d9e2d
Compare
|
While attempting to write some unit tests I found there is arrow/cpp/src/arrow/util/align_util.cc Lines 169 to 205 in e62fbaa I will try to reuse that method rather than re-implementing it. There is also test infrastructure for misaligned array data. |
lidavidm
left a comment
There was a problem hiding this comment.
Can you rebase? Tests appear to be failing
cpp/src/arrow/ipc/reader.cc
Outdated
There was a problem hiding this comment.
Can we use the memory pool in context.options.memory_pool?
There was a problem hiding this comment.
Sure! Ideally we should use the buffer's memory manager rather than the default CPU manager:
arrow/cpp/src/arrow/memory_pool.cc
Lines 907 to 916 in 5ad0b3e
960cb21 to
9909f13
Compare
|
Test arrow/cpp/src/arrow/util/align_util.cc Lines 44 to 52 in bcb4653 https://github.com/apache/arrow/actions/runs/11462607112/job/31894398411?pr=44279#step:13:1548 That test complains a lot about arrow/cpp/src/arrow/util/align_util.cc Lines 56 to 76 in bcb4653 Looks like |
f2dae5b to
d1219d2
Compare
|
@westonpace @sanjibansg |
Co-authored-by: David Li <li.davidm96@gmail.com>
This reverts commit 069c30f7fb87f16d2d8d236ca2e3da1de19dec30.
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
63d7251 to
bf1a7e9
Compare
|
Thanks @EnricoMi . This looks good to me except for the nit above. |
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
|
Thanks @EnricoMi! |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit f45594d. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 15 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ers given to IPC reader (apache#44279) ### Rationale for this change Data retrieved via IPC is expected to provide memory-aligned arrays, but data retrieved via C++ Flight client is mis-aligned. Datafusion (Rust), which requires data type-specific alignment, cannot handle such data: apache#43552. https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding ### What changes are included in this PR? This adds option `arrow::ipc::IpcReadOptions.ensure_alignment` of type `arrow::ipc::Alignment` to configure how RecordBatch array buffers decoded by IPC are realigned. It supports no realignment (default), data type-specific alignment and 64-byte alignment. Implementation mirrors that of [`align_buffers` in arrow-rs](https://github.com/apache/arrow-rs/blob/3293a8c2f9062fca93bee2210d540a1d25155bf5/arrow-data/src/data.rs#L698-L711) (apache/arrow-rs#4681). ### Are these changes tested? Configuration flag tested in unit test. Integration test with Flight server. Manually end-to-end tested that memory alignment fixes issue with reproduction code provided in apache#43552. ### Are there any user-facing changes? Adds option `IpcReadOptions.ensure_alignment` and enum type `Alignment`. * GitHub Issue: apache#32276 Lead-authored-by: Enrico Minack <github@enrico.minack.dev> Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: David Li <li.davidm96@gmail.com>
…ers given to IPC reader (apache#44279) (#6) ### Rationale for this change Data retrieved via IPC is expected to provide memory-aligned arrays, but data retrieved via C++ Flight client is mis-aligned. Datafusion (Rust), which requires data type-specific alignment, cannot handle such data: apache#43552. https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding ### What changes are included in this PR? This adds option `arrow::ipc::IpcReadOptions.ensure_alignment` of type `arrow::ipc::Alignment` to configure how RecordBatch array buffers decoded by IPC are realigned. It supports no realignment (default), data type-specific alignment and 64-byte alignment. Implementation mirrors that of [`align_buffers` in arrow-rs](https://github.com/apache/arrow-rs/blob/3293a8c2f9062fca93bee2210d540a1d25155bf5/arrow-data/src/data.rs#L698-L711) (apache/arrow-rs#4681). ### Are these changes tested? Configuration flag tested in unit test. Integration test with Flight server. Manually end-to-end tested that memory alignment fixes issue with reproduction code provided in apache#43552. ### Are there any user-facing changes? Adds option `IpcReadOptions.ensure_alignment` and enum type `Alignment`. * GitHub Issue: apache#32276 Lead-authored-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: David Li <li.davidm96@gmail.com> Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Rationale for this change
Data retrieved via IPC is expected to provide memory-aligned arrays, but data retrieved via C++ Flight client is mis-aligned. Datafusion (Rust), which requires data type-specific alignment, cannot handle such data: #43552.
https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding
What changes are included in this PR?
This adds option
arrow::ipc::IpcReadOptions.ensure_alignmentof typearrow::ipc::Alignmentto configure how RecordBatch array buffers decoded by IPC are realigned. It supports no realignment (default), data type-specific alignment and 64-byte alignment.Implementation mirrors that of
align_buffersin arrow-rs (apache/arrow-rs#4681).Are these changes tested?
Configuration flag tested in unit test. Integration test with Flight server.
Manually end-to-end tested that memory alignment fixes issue with reproduction code provided in #43552.
Are there any user-facing changes?
Adds option
IpcReadOptions.ensure_alignmentand enum typeAlignment.