-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](cloud) Fix dup key problem when enable_new_tablet_do_compaction=true
#48399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](cloud) Fix dup key problem when enable_new_tablet_do_compaction=true
#48399
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
34302e4 to
ef35589
Compare
82da888 to
1eae985
Compare
|
run buildall |
|
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 32019 ms |
TPC-DS: Total hot run time: 190743 ms |
ClickBench: Total hot run time: 31.44 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloud_p0 |
| } else if (compaction.type() == TabletCompactionJobPB::STOP_TOKEN) { | ||
| // fail all existing compactions | ||
| compactions.Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to wait until all ongoing compactions have completed. If any compaction failed, the allocated resources will be wasted, and retrying after the compaction stop token is removed will add unnecessary overhead to BE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only affect the new tablet created by sc, and the time window is quite small, let's keep it simple
| auto& compactions = *new_recorded_job.mutable_compaction(); | ||
| compactions.erase( | ||
| std::remove_if( | ||
| compactions.begin(), compactions.end(), | ||
| [&](auto& c) { | ||
| return c.has_delete_bitmap_lock_initiator() && | ||
| c.delete_bitmap_lock_initiator() == | ||
| schema_change.delete_bitmap_lock_initiator(); | ||
| }), | ||
| compactions.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there is a known issue in the schema change process due to missing initial parameters. As a result, abort requests will consistently fail before entering this code branch. Therefore, if the schema change is cancelled, the compaction stop token will never be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compaction stop token on be side will be removed, and it won't renew the lease, so finally the stop token job will be removed.
|
run cloud_p0 |
1 similar comment
|
run cloud_p0 |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
9e48b1c to
ac1ff2f
Compare
ac1ff2f to
44abfbf
Compare
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
Hastyshell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
|
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 32761 ms |
TPC-DS: Total hot run time: 185173 ms |
ClickBench: Total hot run time: 31.05 s |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…n=true` (apache#48399) apache#39558 add a config to support shadow tablet to do cumulative compaction during schema change in cloud mode to avoid -235 error on new tablet in the case of a large number of loads. However, this introduces correctness problem on merge-on-write table because some rowsets' delete bitmaps are wrong if there are cumu compactions on new tablet when SC calculate delete bitmaps for incremental rowsets after converting historical data. This PR introduce a new type of compaction `STOP_TOKEN` to fail all existing compaction jobs and disallow doing any compaction on tablet and change the SC process as following: 1. converting historical data 2. register stop token on new tablet to fail all existing compaction jobs and disallow doing any compaction on new tablet 3. calculate delete bitmap for incremental rowsets without lock 4. calculate delete bitmap for incremental rowsets with lock 5. commit SC job and remove stop token ---- ref: apache#29386
### What problem does this PR solve? cloud heavy sc job will retry the whole alter tasks when encounter `KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in `commit_tablet_job`(#46748). We should remove stop token(#48399) in MS for the sc job if it fails in `commit_tablet_job`, otherwise the later retries may fail to regsiter stop token(because the first stop token won't expire in `config::lease_compaction_interval_seconds * 4=80s`) and the schema change job will fail. ``` I20250318 15:40:15.851157 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:40:31.346628 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:40:31.346635 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:40:31.350860 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:40:31.350906 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:40:31.350916 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:40:31.382493 6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289 I20250318 15:40:31.385416 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2 I20250318 15:40:31.387535 6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 I20250318 15:40:31.388285 6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829 I20250318 15:40:31.391326 6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829 I20250318 15:40:31.392035 6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 W20250318 15:40:39.947554 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 } I20250318 15:40:46.204162 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:41:07.487172 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:41:07.487183 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:41:07.489440 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:41:07.489511 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:41:07.489523 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:41:07.490249 6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829 I20250318 15:41:07.490275 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2 W20250318 15:41:07.490864 6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 W20250318 15:41:07.490897 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 ```
### What problem does this PR solve? cloud heavy sc job will retry the whole alter tasks when encounter `KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in `commit_tablet_job`(#46748). We should remove stop token(#48399) in MS for the sc job if it fails in `commit_tablet_job`, otherwise the later retries may fail to regsiter stop token(because the first stop token won't expire in `config::lease_compaction_interval_seconds * 4=80s`) and the schema change job will fail. ``` I20250318 15:40:15.851157 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:40:31.346628 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:40:31.346635 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:40:31.350860 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:40:31.350906 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:40:31.350916 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:40:31.382493 6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289 I20250318 15:40:31.385416 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2 I20250318 15:40:31.387535 6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 I20250318 15:40:31.388285 6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829 I20250318 15:40:31.391326 6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829 I20250318 15:40:31.392035 6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 W20250318 15:40:39.947554 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 } I20250318 15:40:46.204162 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:41:07.487172 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:41:07.487183 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:41:07.489440 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:41:07.489511 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:41:07.489523 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:41:07.490249 6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829 I20250318 15:41:07.490275 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2 W20250318 15:41:07.490864 6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 W20250318 15:41:07.490897 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 ```
…n=true` (apache#48399) ### What problem does this PR solve? apache#39558 add a config to support shadow tablet to do cumulative compaction during schema change in cloud mode to avoid -235 error on new tablet in the case of a large number of loads. However, this introduces correctness problem on merge-on-write table because some rowsets' delete bitmaps are wrong if there are cumu compactions on new tablet when SC calculate delete bitmaps for incremental rowsets after converting historical data. This PR introduce a new type of compaction `STOP_TOKEN` to fail all existing compaction jobs and disallow doing any compaction on tablet and change the SC process as following: 1. converting historical data 2. register stop token on new tablet to fail all existing compaction jobs and disallow doing any compaction on new tablet 3. calculate delete bitmap for incremental rowsets without lock 4. calculate delete bitmap for incremental rowsets with lock 5. commit SC job and remove stop token ---- ref: apache#29386
…he#49275) ### What problem does this PR solve? cloud heavy sc job will retry the whole alter tasks when encounter `KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in `commit_tablet_job`(apache#46748). We should remove stop token(apache#48399) in MS for the sc job if it fails in `commit_tablet_job`, otherwise the later retries may fail to regsiter stop token(because the first stop token won't expire in `config::lease_compaction_interval_seconds * 4=80s`) and the schema change job will fail. ``` I20250318 15:40:15.851157 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:40:31.346628 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:40:31.346635 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:40:31.350860 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:40:31.350906 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:40:31.350916 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:40:31.382493 6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289 I20250318 15:40:31.385416 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2 I20250318 15:40:31.387535 6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 I20250318 15:40:31.388285 6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829 I20250318 15:40:31.391326 6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829 I20250318 15:40:31.392035 6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 W20250318 15:40:39.947554 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 } I20250318 15:40:46.204162 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:41:07.487172 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:41:07.487183 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:41:07.489440 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:41:07.489511 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:41:07.489523 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:41:07.490249 6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829 I20250318 15:41:07.490275 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2 W20250318 15:41:07.490864 6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 W20250318 15:41:07.490897 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 ```
What problem does this PR solve?
#39558 add a config to support shadow tablet to do cumulative compaction during schema change in cloud mode to avoid -235 error on new tablet in the case of a large number of loads. However, this introduces correctness problem on merge-on-write table because some rowsets' delete bitmaps are wrong if there are cumu compactions on new tablet when SC calculate delete bitmaps for incremental rowsets after converting historical data.
This PR introduce a new type of compaction
STOP_TOKENto fail all existing compaction jobs and disallow doing any compaction on tablet and change the SC process as following:ref: #29386
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)