Skip to content

Conversation

@coderex2522
Copy link
Contributor

Why are the changes needed?

if the data stream has no null value, the present stream of this column is redundant.

How was this patch tested?

Through the UT TestReadIntent.testSuppressPresentStream.

@github-actions github-actions bot added the CPP label Mar 17, 2022
dataBuffer->resize(0);
return dataSize;
}
void BufferedOutputStream::suppress() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an empty line before this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

fix the code format
notNullEncoder->add(batch.notNull.data() + offset, numValues, incomingMask);
const char* notNull = batch.notNull.data() + offset;
notNullEncoder->add(notNull, numValues, incomingMask);
for (uint64_t i = 0; i < numValues; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be skipped if hasNullValue is already true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EXPECT_EQ(1, nestedUnionBatch.offsets.data()[1]);
}

TEST(TestReadIntent, testSuppressPresentStream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not part of the ReadIntent tests. Maybe renaming TestReadIntent to TestReader is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feature is mainly a modification on the writer side. So I move the test into TestWriter.cc

reader->createRowReader(RowReaderOptions());
auto batch = rowReader->createRowBatch(2048);
EXPECT_TRUE(rowReader->next(*batch));
EXPECT_EQ(rowCount, batch->numElements);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also verify the content of the read batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, based on your suggestion, I added more coverage to this test.

EXPECT_EQ(stream.column(), 1UL);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the test coverage for seek()? It can make sure the rowIndex posistions are still correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

notNullEncoder->add(notNull, numValues, incomingMask);
for (uint64_t i = 0; i < numValues; ++i) {
if (!notNull[i]) {
hasNullValue = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest checking hasNullValue in line 145 or 146 because hasNullValue may already be set by previous nulls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we should check batch.hasNulls as a shortcut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has been changed to take the or operation with batch.hasNulls.

// written data can be just ignored because they are only flushed in memory
outputStream->suppress();

numLiterals = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not nullify buffer? I'd add a reset() function to clear the state and use it here and in the ByteRleEncoderImpl constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the reset function, so I add this function by the way.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1.

Copy link
Contributor

@stiga-huang stiga-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1

@wgtmac wgtmac merged commit 42aed71 into apache:main Mar 22, 2022
@wgtmac
Copy link
Member

wgtmac commented Mar 22, 2022

I have merged it. Thanks @coderex2522 @dongjoon-hyun @stiga-huang !

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 22, 2022

Thank you, @coderex2522 , @wgtmac and @stiga-huang .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants