Skip to content

Conversation

@hoffermei
Copy link
Contributor

@hoffermei hoffermei commented Oct 20, 2023

What changes were proposed in this pull request?

This PR aims to fix #1640 by resetting BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained when suppress

Why are the changes needed?

As #1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.

How was this patch tested?

I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.

Closes #1640 .

@github-actions github-actions bot added the CPP label Oct 20, 2023
@hoffermei hoffermei force-pushed the present_supress_bugfix branch from be57b97 to be284c8 Compare October 23, 2023 02:07
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and adding the test! Could you create a JIRA issue and add it to the PR title?

@wgtmac
Copy link
Member

wgtmac commented Oct 30, 2023

Gentle ping @hoffermei

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix this, @hoffermei ? Currently, CIs complain on that.

/Users/runner/work/orc/orc/c++/test/TestWriter.cc:2150:47: error: use of old-style cast [-Werror,-Wold-style-cast]
          size_t rowIndex = rowsWrite + row + (size_t) 1;

@dongjoon-hyun dongjoon-hyun changed the title Fix bad read in RleDecoderV2::readByte ORC-1525: Fix bad read in RleDecoderV2::readByte Oct 31, 2023
@dongjoon-hyun dongjoon-hyun added this to the 1.8.6 milestone Oct 31, 2023
@dongjoon-hyun
Copy link
Member

I created a JIRA, ORC-1525, for this.

@dongjoon-hyun
Copy link
Member

If this is originated from ORC-1130, we need to backport this to branch-1.9 and branch-1.8.

dongjoon-hyun and others added 4 commits October 30, 2023 19:07
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
@dongjoon-hyun dongjoon-hyun changed the title ORC-1525: Fix bad read in RleDecoderV2::readByte ORC-1525: Fix bad read in RleDecoderV2::readByte Oct 31, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM from my side.

Could you review this once more, @wgtmac ?

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongjoon-hyun! LGTM. I agree with you that we need to backport this.

@dongjoon-hyun
Copy link
Member

Thank you, @wgtmac !

dongjoon-hyun pushed a commit that referenced this pull request Nov 1, 2023
This PR aims to fix #1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress

As #1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.

I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.

Closes #1640 .

Closes #1641 from hoffermei/present_supress_bugfix.

Lead-authored-by: hoffermei <meihaifeng.hust@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 24beffb)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun modified the milestones: 1.8.6, 1.9.2 Nov 1, 2023
@dongjoon-hyun
Copy link
Member

Unfortunately, it turns out that we need additional PR for branch-1.8 because std::make_unique is not available in branch-1.8.

This PR is merged to main and branch-1.9.

@dongjoon-hyun
Copy link
Member

To @hoffermei , I'd like to assign ORC-1525 to you. Do you have Apache JIRA ID?

@wgtmac
Copy link
Member

wgtmac commented Nov 1, 2023

Unfortunately, it turns out that we need additional PR for branch-1.8 because std::make_unique is not available in branch-1.8.

This PR is merged to main and branch-1.9.

I will take a look

wgtmac pushed a commit to wgtmac/orc that referenced this pull request Nov 2, 2023
This PR aims to fix apache#1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress

As apache#1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.

I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.

Closes apache#1640 .

Closes apache#1641 from hoffermei/present_supress_bugfix.

Lead-authored-by: hoffermei <meihaifeng.hust@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
This PR aims to fix apache#1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress

As apache#1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.

I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.

Closes apache#1640 .

Closes apache#1641 from hoffermei/present_supress_bugfix.

Lead-authored-by: hoffermei <meihaifeng.hust@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ORC-1525: [C++] suppress present stream may cause read orc failed with bad read in RleDecoderV2::readByte

3 participants