-
Notifications
You must be signed in to change notification settings - Fork 506
ORC-1525: Fix bad read in RleDecoderV2::readByte
#1641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
be57b97 to
be284c8
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix and adding the test! Could you create a JIRA issue and add it to the PR title?
|
Gentle ping @hoffermei |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix this, @hoffermei ? Currently, CIs complain on that.
/Users/runner/work/orc/orc/c++/test/TestWriter.cc:2150:47: error: use of old-style cast [-Werror,-Wold-style-cast]
size_t rowIndex = rowsWrite + row + (size_t) 1;
|
I created a JIRA, ORC-1525, for this. |
|
If this is originated from ORC-1130, we need to backport this to branch-1.9 and branch-1.8. |
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
RleDecoderV2::readByte
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM from my side.
Could you review this once more, @wgtmac ?
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dongjoon-hyun! LGTM. I agree with you that we need to backport this.
|
Thank you, @wgtmac ! |
This PR aims to fix #1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress As #1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values. I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write. Closes #1640 . Closes #1641 from hoffermei/present_supress_bugfix. Lead-authored-by: hoffermei <meihaifeng.hust@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 24beffb) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
Unfortunately, it turns out that we need additional PR for branch-1.8 because This PR is merged to |
|
To @hoffermei , I'd like to assign ORC-1525 to you. Do you have Apache JIRA ID? |
I will take a look |
This PR aims to fix apache#1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress As apache#1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values. I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write. Closes apache#1640 . Closes apache#1641 from hoffermei/present_supress_bugfix. Lead-authored-by: hoffermei <meihaifeng.hust@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This PR aims to fix apache#1640 by resetting `BooleanRleEncoderImpl::current` and `BooleanRleEncoderImpl::bitsRemained` when suppress As apache#1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values. I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write. Closes apache#1640 . Closes apache#1641 from hoffermei/present_supress_bugfix. Lead-authored-by: hoffermei <meihaifeng.hust@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR aims to fix #1640 by resetting
BooleanRleEncoderImpl::currentandBooleanRleEncoderImpl::bitsRemainedwhen suppressWhy are the changes needed?
As #1640 suppress no null present stream leaves dirty data of BooleanRleEncoderImpl::current and BooleanRleEncoderImpl::bitsRemained, which will be flush to next stripe's present stream if it has some null values.
How was this patch tested?
I hava add a test testSuppressPresentStreamInPreStripe, which will construct a orc file with two stripe, the first stripe has no null value and seconds stripe has some null values. The constructed orc file writer have some dirty data in BooleanRleEncoderImpl for present stream. In the test I have add check for read ok and read result is same as write.
Closes #1640 .