-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](cloud-compaction) Support shadow tablet to do cumulative compaction during schema change in cloud mode #39558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
…oud mode (apache#37293) In cloud mode, when do schema change, shadow tablet encounters -235 because it cant do cumulative compaction in the case of a large number of loads. And it will prevents the user from continuing to loads. Implementation details: 1. When start schema change, record the end convert rowset version `alter_version` into SchemaChangeJob. 2. For origin tablet, only can do base compaction in [0, `alter_version`] and do cumulative compaction in (`alter_version`, N]. can not do compaction across `alter_verison` such as compaction [a, `alter_version` + n]. 3. For shadow tablet, cannot do base compaction and and do cumulative compaction in (`alter_version`, N]. 4. When the schema change failed because FE or BE coredump, it will retry. When retry the schema change, it will get the `alter_version` from meta_serive, and continue to do it. 5. When finish the schema change job or cancel it, we need to clear the schema change job. Before this pr, it will cover by next schema change.
ff30024 to
a909702
Compare
|
run buildall |
TPC-H: Total hot run time: 38124 ms |
TPC-DS: Total hot run time: 196619 ms |
ClickBench: Total hot run time: 31.64 s |
|
run cloud_p0 |
|
run cloud_p0 |
1 similar comment
|
run cloud_p0 |
| << compaction.input_versions(0) | ||
| << " input_version_end=" << compaction.input_versions(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is supicious
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run buildall |
TPC-H: Total hot run time: 39740 ms |
TPC-H: Total hot run time: 38069 ms |
65244ae to
3ea46ed
Compare
|
run buildall |
TPC-H: Total hot run time: 38659 ms |
TPC-DS: Total hot run time: 191408 ms |
ClickBench: Total hot run time: 30.59 s |
|
run cloud_p0 |
|
run external |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| } | ||
|
|
||
| Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params) { | ||
| Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function '_convert_historical_rowsets' exceeds recommended size/complexity thresholds [readability-function-size]
Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
^Additional context
be/src/cloud/cloud_schema_change_job.cpp:222: 177 lines including whitespace and comments (threshold 80)
Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
^|
PR approved by at least one committer and no changes requested. |
TPC-H: Total hot run time: 38256 ms |
TPC-DS: Total hot run time: 186022 ms |
ClickBench: Total hot run time: 30.55 s |
…mpaction during schema change in cloud mode (#39558) In cloud mode, when do schema change, shadow tablet encounters -235 because it cant do cumulative compaction in the case of a large number of loads. And it will prevents the user from continuing to loads. Implementation details: 1. When start schema change, record the end convert rowset version `alter_version` into SchemaChangeJob. 2. For origin tablet, only can do base compaction in [0, `alter_version`] and do cumulative compaction in (`alter_version`, N]. can not do compaction across `alter_verison` such as compaction [a, `alter_version` + n]. 3. For shadow tablet, cannot do base compaction and and do cumulative compaction in (`alter_version`, N]. 4. When the schema change failed because FE or BE coredump, it will retry. When retry the schema change, it will get the `alter_version` from meta_serive, and continue to do it. 5. When finish the schema change job or cancel it, we need to clear the schema change job. Before this pr, it will cover by next schema change. co-author(main author): @Lchangliang original PR: #37293 --------- Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
…n=true` (#48399) ### What problem does this PR solve? #39558 add a config to support shadow tablet to do cumulative compaction during schema change in cloud mode to avoid -235 error on new tablet in the case of a large number of loads. However, this introduces correctness problem on merge-on-write table because some rowsets' delete bitmaps are wrong if there are cumu compactions on new tablet when SC calculate delete bitmaps for incremental rowsets after converting historical data. This PR introduce a new type of compaction `STOP_TOKEN` to fail all existing compaction jobs and disallow doing any compaction on tablet and change the SC process as following: 1. converting historical data 2. register stop token on new tablet to fail all existing compaction jobs and disallow doing any compaction on new tablet 3. calculate delete bitmap for incremental rowsets without lock 4. calculate delete bitmap for incremental rowsets with lock 5. commit SC job and remove stop token ---- ref: #29386
…n=true` (apache#48399) apache#39558 add a config to support shadow tablet to do cumulative compaction during schema change in cloud mode to avoid -235 error on new tablet in the case of a large number of loads. However, this introduces correctness problem on merge-on-write table because some rowsets' delete bitmaps are wrong if there are cumu compactions on new tablet when SC calculate delete bitmaps for incremental rowsets after converting historical data. This PR introduce a new type of compaction `STOP_TOKEN` to fail all existing compaction jobs and disallow doing any compaction on tablet and change the SC process as following: 1. converting historical data 2. register stop token on new tablet to fail all existing compaction jobs and disallow doing any compaction on new tablet 3. calculate delete bitmap for incremental rowsets without lock 4. calculate delete bitmap for incremental rowsets with lock 5. commit SC job and remove stop token ---- ref: apache#29386
…n=true` (apache#48399) ### What problem does this PR solve? apache#39558 add a config to support shadow tablet to do cumulative compaction during schema change in cloud mode to avoid -235 error on new tablet in the case of a large number of loads. However, this introduces correctness problem on merge-on-write table because some rowsets' delete bitmaps are wrong if there are cumu compactions on new tablet when SC calculate delete bitmaps for incremental rowsets after converting historical data. This PR introduce a new type of compaction `STOP_TOKEN` to fail all existing compaction jobs and disallow doing any compaction on tablet and change the SC process as following: 1. converting historical data 2. register stop token on new tablet to fail all existing compaction jobs and disallow doing any compaction on new tablet 3. calculate delete bitmap for incremental rowsets without lock 4. calculate delete bitmap for incremental rowsets with lock 5. commit SC job and remove stop token ---- ref: apache#29386
Proposed changes
In cloud mode, when do schema change, shadow tablet encounters -235 because it cant do cumulative compaction in the case of a large number of loads. And it will prevents the user from continuing to loads. Implementation details:
alter_versioninto SchemaChangeJob.alter_version] and do cumulative compaction in (alter_version, N]. can not do compaction acrossalter_verisonsuch as compaction [a,alter_version+ n].alter_version, N].alter_versionfrom meta_serive, and continue to do it.co-author(main author): @Lchangliang
original PR: #37293