Skip to content

Conversation

@zhannngchen
Copy link
Contributor

@zhannngchen zhannngchen commented Sep 29, 2024

Proposed changes

Issue Number: close #xxx

MoW performs a lookup on the primary key index for each key during the data loading process, and when a key is hit in the index, it continues to check if the key has been marked for deletion. Generally this check is not very costly.
However, in some scenarios, users perform high-frequency real-time update operations on a larger table, and most of the writes are updating existing data. In this scenario, the version of the table grows very fast, and the delete bitmap is also dense because duplicate keys are continuously being written.
In this scenario, this check is very costly

  1. because it means calling the contains method of the roaring bitmap for almost every version of the rowset hit by an imported key to check if it has been marked for deletion
  2. due to the high frequency of imports, there are typically thousands of versions that are not merged to base compaction.
  3. because of the high duplication rate, every key is basically hit in the index
  4. so this means that for almost every imported key, a loop needs to be called up to thousands of times to check if it has been marked for deletion
  5. This overhead becomes very exaggerated when we are doing load jobs of about 100,000+ rows per second for a table

Here's a flame diagram for this scenario
image

For tables that don't use seq columns, and for non-column update imports, this check can be skipped. Even if a key is already marked for deletion, it's not a problem to mark it for deletion again as if it existed.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhannngchen zhannngchen changed the title [opt](merge-on-write) avoid to check delete bitmap while lookup rowke… [opt](merge-on-write) avoid to check delete bitmap while lookup rowkey in some situation to reduce CPU cost Sep 29, 2024
@zhannngchen
Copy link
Contributor Author

run buildall

@zhannngchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.29% (9627/25815)
Line Coverage: 28.69% (79688/277771)
Region Coverage: 28.12% (41210/146535)
Branch Coverage: 24.74% (20981/84818)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3a5b803fbd90693a22e6998e5d9537dc31e6d7c7_3a5b803fbd90693a22e6998e5d9537dc31e6d7c7/report/index.html

dataroaring pushed a commit that referenced this pull request Oct 11, 2024
…y in some situation to reduce CPU cost (#41480) (#41439)

## Proposed changes

Issue Number: close #xxx

cherry-pick #41480
bobhan1 pushed a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439)

Issue Number: close #xxx

cherry-pick apache#41480
bobhan1 pushed a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439)

Issue Number: close #xxx

cherry-pick apache#41480
bobhan1 pushed a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439)

Issue Number: close #xxx

cherry-pick apache#41480
bobhan1 pushed a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439)

Issue Number: close #xxx

cherry-pick apache#41480
bobhan1 pushed a commit to bobhan1/doris that referenced this pull request Oct 16, 2024
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439)

Issue Number: close #xxx

cherry-pick apache#41480
bobhan1 pushed a commit to bobhan1/doris that referenced this pull request Oct 16, 2024
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439)

Issue Number: close #xxx

cherry-pick apache#41480

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Nov 12, 2024
… check delete bitmap while lookup rowkey in some situation to reduce CPU cost (apache#41480) apache#41439" (apache#203)
@github-actions
Copy link
Contributor

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and feel free a maintainer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants