-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](merge-on-write) avoid to check delete bitmap while lookup rowkey in some situation to reduce CPU cost #41480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…y in some situation to reduce CPU cost
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
|
run buildall |
|
TeamCity be ut coverage result: |
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439) Issue Number: close #xxx cherry-pick apache#41480
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439) Issue Number: close #xxx cherry-pick apache#41480
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439) Issue Number: close #xxx cherry-pick apache#41480
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439) Issue Number: close #xxx cherry-pick apache#41480
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439) Issue Number: close #xxx cherry-pick apache#41480
…rowkey in some situation to reduce CPU cost (apache#41480) (apache#41439) Issue Number: close #xxx cherry-pick apache#41480 fix
… check delete bitmap while lookup rowkey in some situation to reduce CPU cost (apache#41480) apache#41439" (apache#203)
|
We're closing this PR because it hasn't been updated in a while. |
Proposed changes
Issue Number: close #xxx
MoW performs a lookup on the primary key index for each key during the data loading process, and when a key is hit in the index, it continues to check if the key has been marked for deletion. Generally this check is not very costly.
However, in some scenarios, users perform high-frequency real-time update operations on a larger table, and most of the writes are updating existing data. In this scenario, the version of the table grows very fast, and the delete bitmap is also dense because duplicate keys are continuously being written.
In this scenario, this check is very costly
Here's a flame diagram for this scenario

For tables that don't use seq columns, and for non-column update imports, this check can be skipped. Even if a key is already marked for deletion, it's not a problem to mark it for deletion again as if it existed.