Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #49275

### What problem does this PR solve?

cloud heavy sc job will retry the whole alter tasks when encounter
`KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in
`commit_tablet_job`(#46748). We
should remove stop token(#48399) in
MS for the sc job if it fails in `commit_tablet_job`, otherwise the
later retries may fail to regsiter stop token(because the first stop
token won't expire in `config::lease_compaction_interval_seconds *
4=80s`) and the schema change job will fail.

```
I20250318 15:40:15.851157  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:40:31.346628  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:40:31.346635  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:40:31.350860  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:40:31.350906  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:40:31.350916  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:40:31.382493  6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289
I20250318 15:40:31.385416  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2
I20250318 15:40:31.387535  6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
I20250318 15:40:31.388285  6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829
I20250318 15:40:31.391326  6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829
I20250318 15:40:31.392035  6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
W20250318 15:40:39.947554  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 }
I20250318 15:40:46.204162  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:41:07.487172  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:41:07.487183  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:41:07.489440  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:41:07.489511  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:41:07.489523  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:41:07.490249  6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829
I20250318 15:41:07.490275  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2
W20250318 15:41:07.490864  6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
W20250318 15:41:07.490897  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
```
@github-actions github-actions bot requested a review from dataroaring as a code owner March 26, 2025 02:54
@Thearas
Copy link
Contributor

Thearas commented Mar 26, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 26, 2025
@Thearas
Copy link
Contributor

Thearas commented Mar 26, 2025

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/25) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 38.87% (10176/26178)
Line Coverage 30.33% (86910/286589)
Region Coverage 29.34% (44619/152084)
Branch Coverage 25.86% (22697/87764)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 7e8c81d into branch-3.0 Mar 27, 2025
21 of 24 checks passed
@github-actions github-actions bot deleted the auto-pick-49275-branch-3.0 branch March 27, 2025 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants