Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Nov 2, 2023

What changes were proposed in this pull request?

This PR aims to fix #1640 by resetting BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained when suppress

Why are the changes needed?

As #1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.

How was this patch tested?

I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.

Closes #1640 .

This PR aims to fix apache#1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress

As apache#1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.

I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.

Closes apache#1640 .

Closes apache#1641 from hoffermei/present_supress_bugfix.

Lead-authored-by: hoffermei <meihaifeng.hust@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@github-actions github-actions bot added the CPP label Nov 2, 2023
@wgtmac
Copy link
Member Author

wgtmac commented Nov 2, 2023

@hoffermei @dongjoon-hyun PTAL

Copy link
Member

@guiyanakuang guiyanakuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you so much!

@dongjoon-hyun dongjoon-hyun modified the milestones: 1.9.2, 1.8.6 Nov 2, 2023
dongjoon-hyun pushed a commit that referenced this pull request Nov 2, 2023
### What changes were proposed in this pull request?

This PR aims to fix #1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress

### Why are the changes needed?

As #1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.

### How was this patch tested?

I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.

Closes #1640 .

Closes #1645 from wgtmac/branch-1.8.

Authored-by: hoffermei <meihaifeng.hust@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to branch-1.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants