Skip to content

Conversation

@mapleFU
Copy link
Member

@mapleFU mapleFU commented Jan 7, 2023

Problem is mentioned here: #14923

This patch fixes that issue. And the code is a bit complex.

  • Rename variables, since original name is confusing for me.
    • block_initialized_ -> first_block_initialized_, because it's a mask that indicates that if the first block in page is initialized.
    • total_value_count_ -> total_values_remaining_. Because it's not total values within a page, it means remaing values to be decoded within a page
    • values_count_current_mini_block_ -> values_remaining_current_mini_block_, ditto
  • Add variables
    • total_value_count_: the total value numbers within a page.
  • Change Syntax
    • Change InitBlock() to InitBlock() and InitMiniBlock
  • Implemention, most logic is in InitBlock() and InitMiniBlock
  • Testing. Thanks @rok. I use a page within 65 values with bitwidth 32 32 165 165.

And personally, I use the code here for testing:

   for (uint32_t i = num_miniblocks; i < mini_blocks_per_block_; i++) {
-    bit_width_data[i] = 0;
+    //    bit_width_data[i] = 0;
+    bit_width_data[i] = static_cast<uint8_t>(random());
   }

The code works well in both debug and release mode.

@github-actions
Copy link

github-actions bot commented Jan 7, 2023

@github-actions
Copy link

github-actions bot commented Jan 7, 2023

⚠️ GitHub issue #14923 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

github-actions bot commented Jan 7, 2023

⚠️ GitHub issue #14923 has no components, please add labels for components.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

@pitrou @rok PTAL (^_^)

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

Wait, it's still some bug, sorry. Let me fix it

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

I change the code to

  for (uint32_t i = num_miniblocks; i < mini_blocks_per_block_; i++) {
    bit_width_data[i] = random() % 128;
  }

And run multiple times with ASAN, test still passed. Though I don't know how to mock in release, seens we cannot use gmock when DeltaBitPackEncoder is always in .cc file.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

@pitrou @rok PTAL (^_^) I think I fix it, and leaving some comments.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

Oh, when running release, still bug occur in ASAN (https://github.com/apache/arrow/actions/runs/3862492798/jobs/6584098451), seems a ub or SIMD related bug. I'll work on it

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

Well, it's a bit weird.

TEST_F(DeltaBitPackEncoding, MalfordMiniblockBitWidth) {
  std::shared_ptr<ColumnDescriptor> descr_ = ExampleDescr<Int32Type>();
  auto decoder = MakeTypedDecoder<Int32Type>(Encoding::DELTA_BINARY_PACKED, descr_.get());
  using c_type = parquet::Int32Type::c_type;

  unsigned char good_data[] = "\200\001\004A\237\224\316\362\r\242\220\203-  ";
  int encode_buffer_size = 273;
  int num_values = 65;
  std::vector<uint8_t> output_bytes = std::vector<uint8_t>(num_values * sizeof(c_type));
  auto decode_buf = reinterpret_cast<c_type*>(output_bytes.data());
  decoder->SetData(num_values, &good_data[0], encode_buffer_size);

@rok How do you build the good_data? Seems encode_buffer_size = 273 is too large for good_data, and in Release mode, memcpy will reach the stack guard, and raise an asan exception.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

Well, seems that both buffer is 273B. And I need to padding the good_data. @rok would you mind add the generating method of these two datas? Seems leaving a bit-string here is confusing.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 7, 2023

This time it failed because unpack32_avx512 meets an ASAN issue. Maybe I need to know how input data is generated and do some adjustions. Go to sleep now.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 8, 2023

(By the way, my PC using AMD's CPU, which I can't reproduce unpack32_avx512 problem. So it's important for me to know how test data generated...)

@rok
Copy link
Member

rok commented Jan 8, 2023

Hey @mapleFU, sorry for not replying earlier, I am preoccupied with the Jira->Github migration :).
I generated the data by applying this diff and stepping through GDB in CLion probably up to here and copied the data. I am on AMD Ryzen 7 5700G, Ubuntu 22.02.
I can try getting the data string in some other way as GDB could be interfering somehow?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 9, 2023

Sorry for being crazy and at you for multiple times. By using a self-written std::cout PageInspector, I finally make clear the meaning of them, and put the meaning in encoding_test.cc:

  // Both good_data and bad_data is a DELTA_BINARY_PACKED INT32 Page with 65 values.
  // For needed miniblocks, there bit-widths are all 32.
  // There bit-widths for unneeded miniblocks are different:
  // * good_data's unneeded bit-width is 0.
  // * bad_data's unneeded bit-width is 165.

I guess the problem is from the data is truncated from first \0, so they are all shorter than 273B, and the data are malformed. And will raise the corner case. Let me run CI again. @rok

And by the way, min-delta in miniblock might be truncated, so the data is "incomplete". I will generate a new one here instead.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 9, 2023

Test regenerated, and representing using Hex.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 10, 2023

@pitrou @wjones127 Mind take a look? Since I made lots of naming changes( described #15241 (comment) ), and update a Hex testing format. Don't know whether it's ok.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me why you're making so many changes to the decoder. Was the suggested fix in #14923 (comment) not sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from the other data above? IIUC, the trailing bytes should be different, but they seem identical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other data is equal. (In fact, they're all generated by this->Execute(65, 1) )

@mapleFU
Copy link
Member Author

mapleFU commented Jan 10, 2023

In most case, #14923 (comment) is right. But we may have the case above:

if (total_value_count_ != 1) {
  InitBlock();
}

Here handling it would becoming trickey. And update so much code because some name really makes me feel confusing. But maybe I can make it simplier

@pitrou
Copy link
Member

pitrou commented Jan 10, 2023

I don't understand, what would be tricky?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 10, 2023

Let's go back to the original code:

      if (ARROW_PREDICT_FALSE(values_current_mini_block_ == 0)) {
        if (ARROW_PREDICT_FALSE(!block_initialized_)) {
          buffer[i++] = last_value_;
          DCHECK_EQ(i, 1);  // we're at the beginning of the page
          if (ARROW_PREDICT_FALSE(i == max_values)) {
            // When block is uninitialized and i reaches max_values we have two
            // different possibilities:
            // 1. total_value_count_ == 1, which means that the page may have only
            // one value (encoded in the header), and we should not initialize
            // any block.
            // 2. total_value_count_ != 1, which means we should initialize the
            // incoming block for subsequent reads.
            if (total_value_count_ != 1) {
              InitBlock();
            }
            break;
          }
          InitBlock();
        }

And assume the page has 33 values and 32 values per-mini-block:

  1. total_value_count_ == 1, call InitBlock
  2. current_num_values = 0, read block one, change to 32.
  3. current_num_values = 32, read block one, change to 64.

But here, it should be 1 firstly.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 10, 2023

Or maybe I can just add a is_init_page flag:

  void InitBlock(bool is_init_page) {
     int current_miniblock_values = is_init_page ? 1 : 0;
     ...
  }

@pitrou
Copy link
Member

pitrou commented Jan 10, 2023

Why not, but can you add a test that would reproduce the issue?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 10, 2023

Why not, but can you add a test that would reproduce the issue?

Yes, but adding test case is trickey... I cannot use gmock to mock writing non-zero-values...

@pitrou
Copy link
Member

pitrou commented Jan 10, 2023

Perhaps you can use the encoder and manually change the last bitwidths in the output?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 10, 2023

Perhaps you can use the encoder and manually change the last bitwidths in the output?

Sure, it's easy to write that in my personal develop enviroment. But I cannot submit these "mock" code to the codebase, which make writing unit test a bit more trickey.

In my local machine, I can just use

   for (uint32_t i = num_miniblocks; i < mini_blocks_per_block_; i++) {
-    bit_width_data[i] = 0;
+    //    bit_width_data[i] = 0;
+    bit_width_data[i] = static_cast<uint8_t>(random());
   }

and run roundtrip tests. But in unit test, I should submit a test file or a string. And I cannot test too much cases, because I need to paste the whole string here.

By the way, do you think using Hex to represent binary is ok? And if not ok, is there some good example?

@pitrou
Copy link
Member

pitrou commented Jan 10, 2023

Sure, it's easy to write that in my personal develop enviroment. But I cannot submit these "mock" code to the codebase, which make writing unit test a bit more trickey.

You can write the code that mutates the encoder output directly in the unit test. Schematically:

std::vector<uint8_t> encoded = ...;
for (auto i = encoded.length() - 33; i < encoded.length() - 2; ++i) {
  encoded[i] = 0xFF;
}

By the way, do you think using Hex to represent binary is ok?

That works, yes.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 10, 2023

Well, I found that rok's testing data already covers that case:

And assume the page has 65 values and 32 values per-mini-block:

total_value_count_ == 1, call InitBlock
current_num_values = 0, read block one, change to 32.
current_num_values = 32, read block one, change to 64.
current_num_values = 64, read block one, change to 96.

When changing to InitBlock(is_init_page), I meet another trickey problem:

while (i < max_values) {
  ... decode values from miniblocks
}
total_values_remaining_ -= max_values;

So, InitBlock() should have a counter, or it should decrease total_values_remaining_ per loop.

I think there are some fixing:

  1. current my implemention is ok here, because it maintains a current miniblock counter
  2. The change like below:
while (i < max_values) {
  ... decode values from miniblocks, when i increase, decrease remaining
  ... or passing i into InitBlock
}
CHECK_EQ(i, max_values);

@pitrou what's your optionion here? Maintaining a current value counter or passing i into InitBlock, like InitBlock(int prefix_counter), or decrease remaining counter every time we inc?

(Personally I'd like to call InitBlock(int prefix_counter), but I'd like to know what's your opinion)

@pitrou
Copy link
Member

pitrou commented Jan 16, 2023

Ok, I was not satisfied with the test and implementation, so I ended up rewriting them.

@pitrou pitrou requested a review from rok January 16, 2023 17:48
@pitrou
Copy link
Member

pitrou commented Jan 16, 2023

@shanhuuang Do you want to take a look?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 16, 2023

Great job, InitMiniBlock making the code much more clear. Sorry for cannot write a good code like that...

The code look good to me now.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 16, 2023

I update the description here: #15241 (comment)

If you like, you can edit it for your new changes

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I didn't realize ULEB128 values were byte aligned. Good to know!
Thanks for doing this @mapleFU & @pitrou and sorry I was not able to contribute more.

@pitrou
Copy link
Member

pitrou commented Jan 16, 2023

Hmm, interestingly, there are test failures...

@mapleFU
Copy link
Member Author

mapleFU commented Jan 16, 2023

On my MacOs:

[DELTA_BINARY_PACKED] values_per_block: 128, mini_blocks_per_block: 4, total_value_count: 2, last_value: -49
Header consume 5 bytes
Block mini delta is: 83, consume bytes: 2
miniblock bitwidth: 255
miniblock bitwidth: 255
miniblock bitwidth: 255
miniblock bitwidth: 0
miniblock bitwidth consume bytes: 4

The problem is that, for zigzag uleb 128, it may consume 2Bytes if lower bound is -63. Because the min-delta would be less than -63, and consuming 2B. I changed this to 31.

@mapleFU mapleFU requested a review from wjones127 as a code owner January 16, 2023 20:08
@pitrou
Copy link
Member

pitrou commented Jan 16, 2023

Thanks @mapleFU !

@mapleFU
Copy link
Member Author

mapleFU commented Jan 16, 2023

@wjones127 Mind take a look :) ?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 17, 2023

@pitrou would you mind merge this patch? Or I should wait @wjones127 to take a look?

@pitrou
Copy link
Member

pitrou commented Jan 17, 2023

I'll wait a bit for @wjones127 to chime in. Otherwise I'll probably merge tomorrow.

@pitrou pitrou merged commit e837f73 into apache:master Jan 18, 2023
@mapleFU mapleFU deleted the fix-14923 branch January 18, 2023 15:46
@ursabot
Copy link

ursabot commented Jan 19, 2023

Benchmark runs are scheduled for baseline = 4e439f6 and contender = e837f73. e837f73 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️1.14% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.77% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.65% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] e837f73b ec2-t3-xlarge-us-east-2
[Finished] e837f73b test-mac-arm
[Finished] e837f73b ursa-i9-9960x
[Finished] e837f73b ursa-thinkcentre-m75q
[Finished] 4e439f6a ec2-t3-xlarge-us-east-2
[Finished] 4e439f6a test-mac-arm
[Finished] 4e439f6a ursa-i9-9960x
[Finished] 4e439f6a ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Parquet] DeltaBitPackDecoder expects all miniblock bitwidths to be present for the last block

4 participants