Skip to content

[C++] Regression in PlainBooleanDecoder::DecodeArrow #41032

@pitrou

Description

@pitrou

Describe the bug, including details regarding any error messages, version, and platform.

While looking through another PR, I noticed that we recently introduced a bug in PlainBooleanDecoder::DecodeArrow.

Apparently the tests are not thorough enough to detect the issue.

A possible fix is the following patch:

diff --git a/cpp/src/parquet/encoding.cc b/cpp/src/parquet/encoding.cc
index 6e93b49339..a6e60aa012 100644
--- a/cpp/src/parquet/encoding.cc
+++ b/cpp/src/parquet/encoding.cc
@@ -1208,7 +1208,7 @@ int PlainBooleanDecoder::DecodeArrow(
     BitBlockCounter bit_counter(valid_bits, valid_bits_offset, num_values);
     int64_t value_position = 0;
     int64_t valid_bits_offset_position = valid_bits_offset;
-    int64_t previous_value_offset = 0;
+    int64_t previous_value_offset = total_num_values_ - num_values_;
     while (value_position < num_values) {
       auto block = bit_counter.NextWord();
       if (block.AllSet()) {
@@ -1224,8 +1224,7 @@ int PlainBooleanDecoder::DecodeArrow(
       } else {
         for (int64_t i = 0; i < block.length; ++i) {
           if (bit_util::GetBit(valid_bits, valid_bits_offset_position + i)) {
-            bool value = bit_util::GetBit(
-                data_, total_num_values_ - num_values_ + previous_value_offset);
+            bool value = bit_util::GetBit(data_, previous_value_offset);
             builder->UnsafeAppend(value);
             previous_value_offset += 1;
           } else {

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions