-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-3.0: [Opt](cloud-sc) Clear stop token when commit_tablet_job fails #49275
#49494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What problem does this PR solve? cloud heavy sc job will retry the whole alter tasks when encounter `KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in `commit_tablet_job`(#46748). We should remove stop token(#48399) in MS for the sc job if it fails in `commit_tablet_job`, otherwise the later retries may fail to regsiter stop token(because the first stop token won't expire in `config::lease_compaction_interval_seconds * 4=80s`) and the schema change job will fail. ``` I20250318 15:40:15.851157 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:40:31.346628 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:40:31.346635 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:40:31.350860 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:40:31.350906 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:40:31.350916 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:40:31.382493 6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289 I20250318 15:40:31.385416 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2 I20250318 15:40:31.387535 6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 I20250318 15:40:31.388285 6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829 I20250318 15:40:31.391326 6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829 I20250318 15:40:31.392035 6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970 W20250318 15:40:39.947554 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 } I20250318 15:40:46.204162 7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829 I20250318 15:41:07.487172 6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829 I20250318 15:41:07.487183 6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209 I20250318 15:41:07.489440 6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457 I20250318 15:41:07.489511 6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457 I20250318 15:41:07.489523 6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829 I20250318 15:41:07.490249 6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829 I20250318 15:41:07.490275 6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2 W20250318 15:41:07.490864 6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 W20250318 15:41:07.490897 6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970 ```
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
dataroaring
approved these changes
Mar 27, 2025
Contributor
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #49275