-
Notifications
You must be signed in to change notification settings - Fork 506
ORC-1130:[C++] Suppress the present stream when the data stream has no null value #1067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| dataBuffer->resize(0); | ||
| return dataSize; | ||
| } | ||
| void BufferedOutputStream::suppress() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an empty line before this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
fix the code format
c++/src/ColumnWriter.cc
Outdated
| notNullEncoder->add(batch.notNull.data() + offset, numValues, incomingMask); | ||
| const char* notNull = batch.notNull.data() + offset; | ||
| notNullEncoder->add(notNull, numValues, incomingMask); | ||
| for (uint64_t i = 0; i < numValues; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be skipped if hasNullValue is already true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
c++/test/TestReader.cc
Outdated
| EXPECT_EQ(1, nestedUnionBatch.offsets.data()[1]); | ||
| } | ||
|
|
||
| TEST(TestReadIntent, testSuppressPresentStream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not part of the ReadIntent tests. Maybe renaming TestReadIntent to TestReader is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this feature is mainly a modification on the writer side. So I move the test into TestWriter.cc
c++/test/TestReader.cc
Outdated
| reader->createRowReader(RowReaderOptions()); | ||
| auto batch = rowReader->createRowBatch(2048); | ||
| EXPECT_TRUE(rowReader->next(*batch)); | ||
| EXPECT_EQ(rowCount, batch->numElements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also verify the content of the read batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, based on your suggestion, I added more coverage to this test.
c++/test/TestReader.cc
Outdated
| EXPECT_EQ(stream.column(), 1UL); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the test coverage for seek()? It can make sure the rowIndex posistions are still correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| notNullEncoder->add(notNull, numValues, incomingMask); | ||
| for (uint64_t i = 0; i < numValues; ++i) { | ||
| if (!notNull[i]) { | ||
| hasNullValue = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest checking hasNullValue in line 145 or 146 because hasNullValue may already be set by previous nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we should check batch.hasNulls as a shortcut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has been changed to take the or operation with batch.hasNulls.
| // written data can be just ignored because they are only flushed in memory | ||
| outputStream->suppress(); | ||
|
|
||
| numLiterals = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not nullify buffer? I'd add a reset() function to clear the state and use it here and in the ByteRleEncoderImpl constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the reset function, so I add this function by the way.
… the test coverage for seek
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1.
stiga-huang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1
|
I have merged it. Thanks @coderex2522 @dongjoon-hyun @stiga-huang ! |
|
Thank you, @coderex2522 , @wgtmac and @stiga-huang . |
…o null value This closes apache#1067
Why are the changes needed?
if the data stream has no null value, the present stream of this column is redundant.
How was this patch tested?
Through the UT TestReadIntent.testSuppressPresentStream.