Skip to content

Unaligned memory access in SparkUnsafeRow #1849

@Kontinuation

Description

@Kontinuation

This was found when working on #1845. A newly added Rust test that calls into SparkUnsafeRow.is_null_at(idx) was identified to have undefined behavior by Miri:

test execution::planner::tests::test_unpack_dictionary_primitive ... ok
test execution::planner::tests::test_unpack_dictionary_string ... ok
test execution::planner::tests::to_datafusion_filter ... ok
test execution::shuffle::row::test::test_append_null_row_to_struct_builder ... ok
warning: integer-to-pointer cast
    --> core/src/execution/shuffle/row.rs:260:31
     |
260  |             let word_offset = (self.row_addr + (((index >> 6) as i64) << 3)) as *const i64;
     |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ integer-to-pointer cast
     |
     = help: this program is using integer-to-pointer casts or (equivalently) `ptr::with_exposed_provenance`, which means that Miri might miss pointer bugs in this program
     = help: see https://doc.rust-lang.org/nightly/std/ptr/fn.with_exposed_provenance.html for more details on that operation
     = help: to ensure that Miri does not miss bugs in your program, use Strict Provenance APIs (https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance, https://crates.io/crates/sptr) instead
     = help: you can then set `MIRIFLAGS=-Zmiri-strict-provenance` to ensure you are not relying on `with_exposed_provenance` semantics
     = help: alternatively, `MIRIFLAGS=-Zmiri-permissive-provenance` disables this warning
     = note: BACKTRACE on thread `execution::shuf`:
     = note: inside `execution::shuffle::row::SparkUnsafeRow::is_null_at` at core/src/execution/shuffle/row.rs:260:31: 260:91
note: inside `execution::shuffle::row::append_field`
    --> core/src/execution/shuffle/row.rs:447:54
     |
447  |             let nested_row = if row.is_null_row() || row.is_null_at(idx) {
     |                                                      ^^^^^^^^^^^^^^^^^^^
note: inside `execution::shuffle::row::test::test_append_null_struct_field_to_struct_builder`
    --> core/src/execution/shuffle/row.rs:3332:9
     |
3332 |         append_field(&data_type, &mut struct_builder, &row, 0).expect("append field");
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
    --> core/src/execution/shuffle/row.rs:3322:57
     |
3321 |     #[test]
     |     ------- in this procedural macro expansion
3322 |     fn test_append_null_struct_field_to_struct_builder() {
     |                                                         ^

error: Undefined Behavior: accessing memory based on pointer with alignment 1, but alignment 8 is required
    --> core/src/execution/shuffle/row.rs:261:29
     |
261  |             let word: i64 = *word_offset;
     |                             ^^^^^^^^^^^^ accessing memory based on pointer with alignment 1, but alignment 8 is required
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: BACKTRACE on thread `execution::shuf`:
     = note: inside `execution::shuffle::row::SparkUnsafeRow::is_null_at` at core/src/execution/shuffle/row.rs:261:29: 261:41
note: inside `execution::shuffle::row::append_field`
    --> core/src/execution/shuffle/row.rs:447:54
     |
447  |             let nested_row = if row.is_null_row() || row.is_null_at(idx) {
     |                                                      ^^^^^^^^^^^^^^^^^^^
note: inside `execution::shuffle::row::test::test_append_null_struct_field_to_struct_builder`
    --> core/src/execution/shuffle/row.rs:3332:9
     |
3332 |         append_field(&data_type, &mut struct_builder, &row, 0).expect("append field");
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
    --> core/src/execution/shuffle/row.rs:3322:57
     |
3321 |     #[test]
     |     ------- in this procedural macro expansion
3322 |     fn test_append_null_struct_field_to_struct_builder() {
     |                                                         ^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error; 1 warning emitted

error: test failed, to rerun pass `-p datafusion-comet --lib`

Caused by:
  process didn't exit successfully: `/home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner /home/runner/work/datafusion-comet/datafusion-comet/native/target/miri/x86_64-unknown-linux-gnu/debug/deps/comet-92e75f0cb556895b` (exit status: 1)
note: test exited abnormally; to see the full output pass --nocapture to the harness.
test execution::shuffle::row::test::test_append_null_struct_field_to_struct_builder ... 

We made Miri to ignore the test in that PR to pass CI, but we still need to address this undefined behavior anyway. https://github.com/apache/datafusion-comet/pull/1845/files#r2126985383

Although unaligned memory access are allowed by mostly used architectures including amd64 and aarch64, and they do not exhibit performance penalties in recent CPU models, there are still some architectures do not support unaligned memory access. We need to eliminate this undefined behavior in our native code. When the native library is compiled to target platforms that supports unaligned memory access, the compiler should emit unaligned memory instructions so there should be no performance penalty.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions