*: support only deserialize necessary rows#9678
*: support only deserialize necessary rows#9678Lloyd-Pottiger wants to merge 7 commits intopingcap:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| size_t limit, | ||
| double /*avg_value_size_hint*/) const | ||
| double /*avg_value_size_hint*/, | ||
| const IColumn::Filter * filter) const |
There was a problem hiding this comment.
Add unit test about deserializeBinaryBulk(..., filter) to ensure the correctness for DataTypeDecimal/DataTypeEnum/DataTypeNumberBase and DataTypeString
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
228880d to
46b4b61
Compare
| if (block) | ||
| { | ||
| block.setStartOffset(read_rows); | ||
| read_rows += filter.size(); |
There was a problem hiding this comment.
Use read_rows += block.rows() is more reasonable?
There was a problem hiding this comment.
No, block.rows = passed_count < filter.size()
| DMFileReaderPool::instance().set(*this, cd.id, start_pack_id, pack_count, column); | ||
| // Delete column from local cache since it is not used anymore. | ||
| data_sharing_col_data_cache->delColumn(cd.id, next_pack_id); | ||
| return column; |
There was a problem hiding this comment.
Do we need to apply column->filter(filter) here?
| Block block = read(&block_filter); | ||
| size_t passed_count = countBytesInFilter(block_filter); | ||
| for (size_t i = 0; i < block.columns(); ++i) | ||
| { | ||
| std::vector<size_t> positions; | ||
| positions.reserve(passed_count); | ||
| for (size_t p = offset; p < offset + rows; ++p) | ||
| { | ||
| if (filter[p]) | ||
| positions.push_back(p - offset); | ||
| } | ||
| for (size_t i = 0; i < block.columns(); ++i) | ||
| { | ||
| columns[i]->insertDisjunctFrom(*block.getByPosition(i).column, positions); | ||
| } | ||
| auto col = block.getByPosition(i).column; | ||
| // Some columns may only deserialize the passed rows. | ||
| if (col->size() != passed_count) | ||
| col = col->filter(block_filter, passed_count); |
There was a problem hiding this comment.
We'd better ensure all the columns return by read(IColumn::Filter * filter) has the same number of rows. But not handle it in this for-loop.
There was a problem hiding this comment.
I have addressed in #9687, since we will rewrite this soon, so just keep it in this PR.
What problem does this PR solve?
Issue Number: ref #9699
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note