Skip to content

Conversation

@eldenmoon
Copy link
Member

@eldenmoon eldenmoon commented Dec 19, 2023

Proposed changes

  1. [Fix](memtable) fix shrink_memtable_by_agg without duplicated keys #28660
  2. [Fix](memtable) fix shrink_memtable_by_agg should also update `_row… #28536
    Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…_in_blocks` (apache#28536)

Otherwise using the stale `_row_in_blocks` will result in heap-buffer-overflow

```
==2695213==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62900122e210 at pc 0x56524744aecf bp 0x7f62c595ef7
0 sp 0x7f62c595ef68
READ of size 8 at 0x62900122e210 thread T1627 (MemTableFlushTh)
    #0 0x56524744aece in doris::vectorized::ColumnVector<long>::insert_indices_from(doris::vectorized::IColumn const&, unsigned int const*, unsigned int const*) /mnt/disk2/lihangyu/doris/be/src/vec/columns/column_vector.cpp:378:33
    #1 0x5652472a7538 in doris::vectorized::ColumnNullable::insert_indices_from(doris::vectorized::IColumn const&, unsigned int const*, unsigned int const*) /mnt/disk2/lihangyu/doris/be/src/vec/columns/column_nullable.cpp:310:25
    apache#2 0x56524782a62a in doris::vectorized::MutableBlock::add_rows(doris::vectorized::Block const*, unsigned int const*, unsigned int const*) /mnt/disk2/lihangyu/doris/be/src/vec/core/block.cpp:961:14
    apache#3 0x565233f187ae in doris::MemTable::_put_into_output(doris::vectorized::Block&) /mnt/disk2/lihangyu/doris/be/src/olap/memtable.cpp:248:27
    apache#4 0x565233f1db66 in doris::MemTable::to_block() /mnt/disk2/lihangyu/doris/be/src/olap/memtable.cpp:496:13
    apache#5 0x565233efae60 in doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*) /mnt/disk2/lihangyu/doris/be/src/olap/memtable_flush_executor.cpp:121:62
    apache#6 0x565233efc8d6 in doris::FlushToken::_flush_memtable(doris::MemTable*, int, long) /mnt/disk2/lihangyu/doris/be/src/olap/memtable_flush_executor.cpp:150:16
    apache#7 0x565233f0c5eb in doris::MemtableFlushTask::run() /mnt/disk2/lihangyu/doris/be/src/olap/memtable_flush_executor.cpp:58:23
```
@eldenmoon
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 19, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 49.41 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.8 seconds inserted 10000000 Rows, about 335K ops/s
storage size: 17162372294 Bytes

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.82% (7995/21139)
Line Coverage: 29.49% (64913/220083)
Region Coverage: 28.98% (33408/115290)
Branch Coverage: 24.84% (17140/68988)
Coverage Report: http://coverage.selectdb-in.cc/coverage/95b1a90181d85e35952742fac57f0553d07d90cb_95b1a90181d85e35952742fac57f0553d07d90cb/report/index.html

…pache#28660)

remove duplicated logic:
```
vectorized::Block in_block = _input_mutable_block.to_block();
_put_into_output(in_block);
```
`_input_mutable_block.to_block()` will move `_input_mutable_block`, and lead to `flush` with empty block
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon eldenmoon changed the title [Fix](memtable) fix shrink_memtable_by_agg should also update _row_in_blocks [Pick](branch-2.0) pick two PRs to fix memtable shrink Dec 20, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.82% (7995/21139)
Line Coverage: 29.50% (64929/220084)
Region Coverage: 28.98% (33414/115290)
Branch Coverage: 24.85% (17146/68988)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1a8cd1b094c7642e627b6d3a6ff7a9d3dde3f592_1a8cd1b094c7642e627b6d3a6ff7a9d3dde3f592/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 50.85 seconds
stream load tsv: 575 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.5 seconds inserted 10000000 Rows, about 338K ops/s
storage size: 17161956019 Bytes

@xiaokang xiaokang merged commit caf6efd into apache:branch-2.0 Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants