Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Oct 9, 2023

Proposed changes

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

// For UT
DeleteBitmapPtr get_delete_bitmap() { return _delete_bitmap; }

std::shared_ptr<PartialUpdateInfo> get_partial_update_info() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'get_partial_update_info' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
std::shared_ptr<PartialUpdateInfo> get_partial_update_info() const {
[[nodiscard]] std::shared_ptr<PartialUpdateInfo> get_partial_update_info() const {

@bobhan1 bobhan1 force-pushed the split-out-partial-update-infos-from-tablet-schema branch 5 times, most recently from 6a2aad7 to dac6ff6 Compare October 9, 2023 04:19
@bobhan1
Copy link
Contributor Author

bobhan1 commented Oct 9, 2023

run buildall

@bobhan1 bobhan1 force-pushed the split-out-partial-update-infos-from-tablet-schema branch 7 times, most recently from 1cc37d2 to ef6af23 Compare October 9, 2023 08:57
@bobhan1 bobhan1 marked this pull request as ready for review October 9, 2023 08:59
@bobhan1 bobhan1 force-pushed the split-out-partial-update-infos-from-tablet-schema branch from ef6af23 to a2298a6 Compare October 9, 2023 09:05
@bobhan1
Copy link
Contributor Author

bobhan1 commented Oct 9, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.81 seconds
stream load tsv: 559 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162259983 Bytes

@bobhan1 bobhan1 changed the title [enhancement] split partial update infos from tablet schema [enhancement](merge-on-write) Split partial update infos from tablet schema Oct 9, 2023
@bobhan1 bobhan1 force-pushed the split-out-partial-update-infos-from-tablet-schema branch 3 times, most recently from 60503b6 to dd5a2b8 Compare October 10, 2023 03:20
@bobhan1
Copy link
Contributor Author

bobhan1 commented Oct 10, 2023

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.30% (8148/22448)
Line Coverage: 28.44% (65181/229166)
Region Coverage: 27.37% (33770/123390)
Branch Coverage: 24.04% (17205/71560)
Coverage Report: http://coverage.selectdb-in.cc/coverage/dd5a2b817f88873434b1efb9b6520f6b3d581192_dd5a2b817f88873434b1efb9b6520f6b3d581192/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.5 seconds
stream load tsv: 574 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162466608 Bytes

@bobhan1
Copy link
Contributor Author

bobhan1 commented Oct 10, 2023

run buildall

1 similar comment
@bobhan1
Copy link
Contributor Author

bobhan1 commented Oct 10, 2023

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.28% (8148/22456)
Line Coverage: 28.43% (65179/229257)
Region Coverage: 27.36% (33776/123459)
Branch Coverage: 24.03% (17205/71612)
Coverage Report: http://coverage.selectdb-in.cc/coverage/302be55f489671f0e5e60184d5ab0f480ea6af6a_302be55f489671f0e5e60184d5ab0f480ea6af6a/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.57 seconds
stream load tsv: 554 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162128127 Bytes

@bobhan1 bobhan1 force-pushed the split-out-partial-update-infos-from-tablet-schema branch from 302be55 to b6a40f2 Compare October 10, 2023 08:24
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 17, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit 1514f78 into apache:master Oct 17, 2023
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 23, 2023
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 23, 2023
zhannngchen pushed a commit that referenced this pull request Oct 23, 2023
1. remove deprecated comment on fields that is wrongly added in #25147. These fields will be used when coordinator BE send infos to executor BEs. They will only be used during RPC and will not be persisted.
2. eliminate some unnecessary copys.
xiaokang pushed a commit that referenced this pull request Oct 23, 2023
xiaokang pushed a commit that referenced this pull request Oct 24, 2023
1. remove deprecated comment on fields that is wrongly added in #25147. These fields will be used when coordinator BE send infos to executor BEs. They will only be used during RPC and will not be persisted.
2. eliminate some unnecessary copys.
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
1. remove deprecated comment on fields that is wrongly added in apache#25147. These fields will be used when coordinator BE send infos to executor BEs. They will only be used during RPC and will not be persisted.
2. eliminate some unnecessary copys.
zhannngchen pushed a commit that referenced this pull request Nov 8, 2023
…tmaps of the committed transactions are calculated by the compaction (#26556)

a fix for #25147
zhannngchen pushed a commit to zhannngchen/incubator-doris that referenced this pull request Nov 9, 2023
…tmaps of the committed transactions are calculated by the compaction (apache#26556)

a fix for apache#25147
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 13, 2023
…tmaps of the committed transactions are calculated by the compaction (apache#26556)

a fix for apache#25147
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
1. remove deprecated comment on fields that is wrongly added in apache#25147. These fields will be used when coordinator BE send infos to executor BEs. They will only be used during RPC and will not be persisted.
2. eliminate some unnecessary copys.
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…tmaps of the committed transactions are calculated by the compaction (apache#26556)

a fix for apache#25147
dataroaring pushed a commit that referenced this pull request Aug 7, 2024
…of BE restart after a partial update has commited (#38331)

## Proposed changes
If a partial update has conflict with another load during publish phase,
it should combine the two load's data into one to get the corrrect
result. This procedure needs partial update info. But If BE crashed
after the partial update load has committed, the partial update info
will be missing becasuse it's not persisted and will not be restored in
`DataDir::load()`. This PR persists partial update info in RocksDB
before the txn is commited and remove it after the publish phase.
Before #25147, partial update info
is persisted with tablet_schema in RocksDB.
#25147 split partial update info
from tablet schema but forget to handle the persistence logic.
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
…of BE restart after a partial update has commited (#38331)

## Proposed changes
If a partial update has conflict with another load during publish phase,
it should combine the two load's data into one to get the corrrect
result. This procedure needs partial update info. But If BE crashed
after the partial update load has committed, the partial update info
will be missing becasuse it's not persisted and will not be restored in
`DataDir::load()`. This PR persists partial update info in RocksDB
before the txn is commited and remove it after the publish phase.
Before #25147, partial update info
is persisted with tablet_schema in RocksDB.
#25147 split partial update info
from tablet schema but forget to handle the persistence logic.
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Aug 14, 2024
…of BE restart after a partial update has commited (apache#38331)

## Proposed changes
If a partial update has conflict with another load during publish phase,
it should combine the two load's data into one to get the corrrect
result. This procedure needs partial update info. But If BE crashed
after the partial update load has committed, the partial update info
will be missing becasuse it's not persisted and will not be restored in
`DataDir::load()`. This PR persists partial update info in RocksDB
before the txn is commited and remove it after the publish phase.
Before apache#25147, partial update info
is persisted with tablet_schema in RocksDB.
apache#25147 split partial update info
from tablet schema but forget to handle the persistence logic.
dataroaring pushed a commit that referenced this pull request Aug 16, 2024
…of BE restart after a partial update has commited (#38331)

## Proposed changes
If a partial update has conflict with another load during publish phase,
it should combine the two load's data into one to get the corrrect
result. This procedure needs partial update info. But If BE crashed
after the partial update load has committed, the partial update info
will be missing becasuse it's not persisted and will not be restored in
`DataDir::load()`. This PR persists partial update info in RocksDB
before the txn is commited and remove it after the publish phase.
Before #25147, partial update info
is persisted with tablet_schema in RocksDB.
#25147 split partial update info
from tablet schema but forget to handle the persistence logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged merge_conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants