Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Feb 26, 2025

What problem does this PR solve?

#39558 add a config to support shadow tablet to do cumulative compaction during schema change in cloud mode to avoid -235 error on new tablet in the case of a large number of loads. However, this introduces correctness problem on merge-on-write table because some rowsets' delete bitmaps are wrong if there are cumu compactions on new tablet when SC calculate delete bitmaps for incremental rowsets after converting historical data.

This PR introduce a new type of compaction STOP_TOKEN to fail all existing compaction jobs and disallow doing any compaction on tablet and change the SC process as following:

  1. converting historical data
  2. register stop token on new tablet to fail all existing compaction jobs and disallow doing any compaction on new tablet
  3. calculate delete bitmap for incremental rowsets without lock
  4. calculate delete bitmap for incremental rowsets with lock
  5. commit SC job and remove stop token

ref: #29386

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Feb 26, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 force-pushed the fix-new-tablet-compaction-dup-key branch 7 times, most recently from 34302e4 to ef35589 Compare February 28, 2025 02:51
@bobhan1 bobhan1 marked this pull request as ready for review February 28, 2025 02:57
@bobhan1 bobhan1 force-pushed the fix-new-tablet-compaction-dup-key branch 2 times, most recently from 82da888 to 1eae985 Compare February 28, 2025 10:30
@dataroaring
Copy link
Contributor

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 81.97% (1064/1298)
Line Coverage: 65.67% (17639/26860)
Region Coverage: 65.15% (8690/13339)
Branch Coverage: 55.09% (4689/8512)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1eae9856287f6abbc291bdaaa1f454615e34d8ef_1eae9856287f6abbc291bdaaa1f454615e34d8ef_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32019 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1eae9856287f6abbc291bdaaa1f454615e34d8ef, data reload: false

------ Round 1 ----------------------------------
q1	17662	5283	5148	5148
q2	2084	309	169	169
q3	10763	1240	791	791
q4	10213	1055	536	536
q5	7616	2451	2771	2451
q6	196	172	134	134
q7	942	749	605	605
q8	9331	1317	1140	1140
q9	4967	4766	4869	4766
q10	6830	2311	1914	1914
q11	492	275	267	267
q12	351	351	219	219
q13	17779	3689	3062	3062
q14	225	237	210	210
q15	544	484	450	450
q16	634	637	600	600
q17	578	884	349	349
q18	6854	6186	6235	6186
q19	1938	968	563	563
q20	318	326	195	195
q21	2826	2292	1954	1954
q22	372	331	310	310
Total cold run time: 103515 ms
Total hot run time: 32019 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5229	5352	5118	5118
q2	242	335	231	231
q3	2218	2685	2316	2316
q4	1450	1843	1392	1392
q5	4479	4145	4158	4145
q6	214	167	127	127
q7	1887	1892	1761	1761
q8	2614	2671	2552	2552
q9	7411	7214	7304	7214
q10	3035	3231	2774	2774
q11	579	536	495	495
q12	692	785	586	586
q13	3343	3998	3330	3330
q14	276	320	263	263
q15	518	461	480	461
q16	664	682	634	634
q17	1141	1623	1323	1323
q18	7725	7379	7282	7282
q19	816	875	914	875
q20	1997	2023	1881	1881
q21	5559	5086	4794	4794
q22	640	611	558	558
Total cold run time: 52729 ms
Total hot run time: 50112 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190743 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1eae9856287f6abbc291bdaaa1f454615e34d8ef, data reload: false

query1	1332	989	945	945
query2	6147	1937	1879	1879
query3	11163	4781	4583	4583
query4	53397	25383	22918	22918
query5	5192	597	496	496
query6	362	212	183	183
query7	5014	509	300	300
query8	315	248	242	242
query9	5951	2600	2543	2543
query10	438	331	255	255
query11	15587	14978	15111	14978
query12	155	105	108	105
query13	1142	507	372	372
query14	11130	6508	6771	6508
query15	207	190	188	188
query16	7107	694	476	476
query17	1120	743	592	592
query18	1563	406	303	303
query19	204	183	157	157
query20	121	124	122	122
query21	208	131	109	109
query22	4347	4564	4267	4267
query23	33883	33384	33427	33384
query24	5698	2447	2464	2447
query25	462	462	424	424
query26	688	274	164	164
query27	1753	502	334	334
query28	3128	2463	2492	2463
query29	591	555	457	457
query30	221	185	158	158
query31	883	881	794	794
query32	74	67	60	60
query33	448	358	295	295
query34	759	885	524	524
query35	809	822	757	757
query36	956	1014	914	914
query37	121	100	75	75
query38	4232	4302	4151	4151
query39	1497	1469	1460	1460
query40	209	125	104	104
query41	52	48	55	48
query42	122	107	102	102
query43	519	517	483	483
query44	1325	840	828	828
query45	180	183	169	169
query46	902	1074	661	661
query47	1828	1844	1788	1788
query48	395	416	305	305
query49	719	532	476	476
query50	726	753	438	438
query51	4307	4296	4191	4191
query52	112	104	92	92
query53	224	269	192	192
query54	497	511	435	435
query55	85	84	81	81
query56	296	261	271	261
query57	1165	1194	1113	1113
query58	250	229	241	229
query59	2914	3031	2723	2723
query60	283	294	263	263
query61	143	153	139	139
query62	770	725	701	701
query63	240	201	209	201
query64	1795	1114	802	802
query65	3307	3242	3196	3196
query66	736	423	314	314
query67	15918	15444	15296	15296
query68	6621	873	505	505
query69	543	296	262	262
query70	1161	1128	1113	1113
query71	507	296	258	258
query72	5705	3655	3841	3655
query73	1316	754	357	357
query74	9056	9085	8788	8788
query75	3930	3165	2746	2746
query76	4169	1183	739	739
query77	646	380	334	334
query78	10135	10029	9285	9285
query79	2899	838	587	587
query80	657	540	458	458
query81	520	288	246	246
query82	697	124	97	97
query83	267	169	154	154
query84	298	102	74	74
query85	773	350	301	301
query86	370	312	284	284
query87	4510	4454	4473	4454
query88	3747	2241	2235	2235
query89	417	326	280	280
query90	1787	195	192	192
query91	136	139	116	116
query92	76	59	59	59
query93	2138	1036	569	569
query94	662	422	291	291
query95	388	268	253	253
query96	474	560	276	276
query97	3318	3416	3302	3302
query98	232	205	210	205
query99	1442	1387	1255	1255
Total cold run time: 298021 ms
Total hot run time: 190743 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1eae9856287f6abbc291bdaaa1f454615e34d8ef, data reload: false

query1	0.03	0.03	0.03
query2	0.10	0.05	0.06
query3	0.28	0.05	0.05
query4	1.60	0.07	0.08
query5	0.54	0.54	0.55
query6	1.18	0.72	0.73
query7	0.02	0.02	0.03
query8	0.05	0.05	0.05
query9	0.63	0.52	0.51
query10	0.57	0.59	0.57
query11	0.25	0.12	0.12
query12	0.24	0.12	0.12
query13	0.63	0.61	0.61
query14	2.67	2.69	2.69
query15	0.99	0.87	0.87
query16	0.37	0.37	0.37
query17	1.03	1.02	1.03
query18	0.18	0.18	0.17
query19	1.97	1.87	1.98
query20	0.01	0.01	0.01
query21	15.38	0.98	0.67
query22	0.94	1.06	0.77
query23	14.70	1.61	0.75
query24	4.87	0.66	0.31
query25	0.18	0.09	0.09
query26	0.56	0.23	0.18
query27	0.09	0.08	0.09
query28	11.02	1.21	0.54
query29	12.55	4.10	3.41
query30	0.27	0.08	0.07
query31	2.83	0.63	0.43
query32	3.25	0.60	0.49
query33	3.05	3.12	3.11
query34	16.41	5.26	4.41
query35	4.58	4.51	4.54
query36	0.64	0.51	0.50
query37	0.20	0.17	0.17
query38	0.16	0.15	0.16
query39	0.05	0.04	0.04
query40	0.20	0.16	0.15
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.05	0.05
Total cold run time: 105.48 s
Total hot run time: 31.44 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/255) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 44.95% (12001/26698)
Line Coverage 34.45% (100867/292756)
Region Coverage 33.63% (51643/153554)
Branch Coverage 29.40% (26124/88866)

@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 3, 2025

run cloud_p0

Comment on lines +215 to +217
} else if (compaction.type() == TabletCompactionJobPB::STOP_TOKEN) {
// fail all existing compactions
compactions.Clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to wait until all ongoing compactions have completed. If any compaction failed, the allocated resources will be wasted, and retrying after the compaction stop token is removed will add unnecessary overhead to BE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only affect the new tablet created by sc, and the time window is quite small, let's keep it simple

Comment on lines 1132 to 1142
auto& compactions = *new_recorded_job.mutable_compaction();
compactions.erase(
std::remove_if(
compactions.begin(), compactions.end(),
[&](auto& c) {
return c.has_delete_bitmap_lock_initiator() &&
c.delete_bitmap_lock_initiator() ==
schema_change.delete_bitmap_lock_initiator();
}),
compactions.end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there is a known issue in the schema change process due to missing initial parameters. As a result, abort requests will consistently fail before entering this code branch. Therefore, if the schema change is cancelled, the compaction stop token will never be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compaction stop token on be side will be removed, and it won't renew the lease, so finally the stop token job will be removed.

@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 6, 2025

run cloud_p0

1 similar comment
@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 7, 2025

run cloud_p0

dataroaring
dataroaring previously approved these changes Mar 10, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 10, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

zhannngchen
zhannngchen previously approved these changes Mar 10, 2025
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Mar 12, 2025
@bobhan1 bobhan1 force-pushed the fix-new-tablet-compaction-dup-key branch from 9e48b1c to ac1ff2f Compare March 12, 2025 04:16
tmp
@bobhan1 bobhan1 force-pushed the fix-new-tablet-compaction-dup-key branch from ac1ff2f to 44abfbf Compare March 12, 2025 05:14
@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 12, 2025

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 12, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Collaborator

@Hastyshell Hastyshell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.88% (1075/1297)
Line Coverage: 65.89% (17737/26918)
Region Coverage: 65.23% (8733/13388)
Branch Coverage: 55.18% (4709/8534)
Coverage Report: http://coverage.selectdb-in.cc/coverage/44abfbf71ab1cd61d812039072eb3679dd7002ed_44abfbf71ab1cd61d812039072eb3679dd7002ed_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32761 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 44abfbf71ab1cd61d812039072eb3679dd7002ed, data reload: false

------ Round 1 ----------------------------------
q1	17628	5281	5233	5233
q2	2039	287	157	157
q3	10508	1319	710	710
q4	10236	1033	544	544
q5	7571	2351	2430	2351
q6	188	167	134	134
q7	930	764	605	605
q8	9307	1286	1118	1118
q9	4941	4745	4813	4745
q10	6835	2333	1869	1869
q11	468	272	246	246
q12	341	346	219	219
q13	17760	3677	3160	3160
q14	221	236	208	208
q15	524	487	485	485
q16	633	642	568	568
q17	584	896	348	348
q18	6825	6558	6385	6385
q19	1494	967	549	549
q20	321	325	188	188
q21	2846	2118	1949	1949
q22	1016	1027	990	990
Total cold run time: 103216 ms
Total hot run time: 32761 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5161	5184	5137	5137
q2	245	335	233	233
q3	2184	2676	2275	2275
q4	1461	1825	1379	1379
q5	4268	4142	4166	4142
q6	210	165	125	125
q7	1903	1988	1748	1748
q8	2647	2761	2670	2670
q9	7240	7150	7245	7150
q10	2999	3207	2721	2721
q11	569	505	487	487
q12	718	778	622	622
q13	3426	3848	3263	3263
q14	284	300	283	283
q15	549	472	484	472
q16	662	660	658	658
q17	1147	1583	1405	1405
q18	7902	7624	7324	7324
q19	821	831	871	831
q20	1982	2075	1869	1869
q21	5533	4924	4911	4911
q22	1143	1065	986	986
Total cold run time: 53054 ms
Total hot run time: 50691 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185173 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 44abfbf71ab1cd61d812039072eb3679dd7002ed, data reload: false

query1	1012	403	380	380
query2	6527	1965	1955	1955
query3	6802	218	210	210
query4	26798	23176	22844	22844
query5	4376	660	491	491
query6	322	200	187	187
query7	4614	512	290	290
query8	293	247	241	241
query9	8635	2619	2624	2619
query10	460	331	254	254
query11	15548	15211	15084	15084
query12	161	106	106	106
query13	1661	531	415	415
query14	9598	6370	6909	6370
query15	216	202	170	170
query16	7797	656	454	454
query17	1225	699	563	563
query18	1992	410	309	309
query19	197	197	160	160
query20	130	115	119	115
query21	210	122	103	103
query22	4185	4305	4014	4014
query23	34025	33104	33091	33091
query24	7757	2357	2358	2357
query25	545	452	392	392
query26	1227	272	156	156
query27	2147	480	328	328
query28	3923	2448	2411	2411
query29	708	547	427	427
query30	291	217	189	189
query31	948	856	801	801
query32	76	70	62	62
query33	562	367	303	303
query34	778	831	490	490
query35	776	827	741	741
query36	934	995	886	886
query37	120	98	79	79
query38	4274	4178	4143	4143
query39	1476	1396	1369	1369
query40	209	135	106	106
query41	56	56	51	51
query42	116	107	105	105
query43	507	501	488	488
query44	1311	778	783	778
query45	182	170	162	162
query46	820	1011	659	659
query47	1784	1830	1695	1695
query48	381	412	297	297
query49	783	533	410	410
query50	671	725	429	429
query51	4114	4212	4153	4153
query52	105	105	96	96
query53	226	264	185	185
query54	494	501	422	422
query55	84	83	88	83
query56	270	264	282	264
query57	1127	1134	1057	1057
query58	249	233	264	233
query59	2579	2861	2553	2553
query60	291	268	255	255
query61	125	133	122	122
query62	788	731	660	660
query63	219	185	183	183
query64	4303	1045	686	686
query65	4389	4333	4334	4333
query66	1074	416	302	302
query67	15800	15540	15290	15290
query68	8202	865	515	515
query69	497	308	258	258
query70	1218	1150	1068	1068
query71	467	287	262	262
query72	5318	3596	3802	3596
query73	784	730	353	353
query74	9342	9186	8729	8729
query75	3808	3185	2690	2690
query76	3725	1180	749	749
query77	791	389	274	274
query78	9919	10032	9280	9280
query79	2754	826	586	586
query80	679	519	438	438
query81	508	257	223	223
query82	697	127	100	100
query83	182	172	152	152
query84	238	104	77	77
query85	783	363	316	316
query86	395	312	285	285
query87	4454	4496	4528	4496
query88	3652	2248	2276	2248
query89	397	315	279	279
query90	1830	209	204	204
query91	139	139	113	113
query92	72	62	59	59
query93	1948	1057	583	583
query94	669	414	305	305
query95	352	272	267	267
query96	495	553	278	278
query97	3329	3413	3277	3277
query98	233	207	199	199
query99	1332	1377	1298	1298
Total cold run time: 275592 ms
Total hot run time: 185173 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.05 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 44abfbf71ab1cd61d812039072eb3679dd7002ed, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.04
query3	0.24	0.07	0.06
query4	1.63	0.10	0.11
query5	0.56	0.54	0.54
query6	1.23	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.62	0.51	0.51
query10	0.58	0.59	0.59
query11	0.16	0.11	0.11
query12	0.15	0.12	0.11
query13	0.62	0.60	0.60
query14	2.67	2.69	2.82
query15	0.94	0.86	0.84
query16	0.38	0.37	0.38
query17	1.01	1.04	1.01
query18	0.21	0.20	0.19
query19	1.95	1.87	1.92
query20	0.01	0.01	0.01
query21	15.37	0.88	0.55
query22	0.78	1.30	0.97
query23	14.68	1.40	0.60
query24	7.24	1.26	0.84
query25	0.51	0.22	0.15
query26	0.55	0.17	0.14
query27	0.05	0.04	0.05
query28	9.17	0.89	0.42
query29	12.53	3.97	3.25
query30	0.25	0.10	0.06
query31	2.81	0.58	0.38
query32	3.24	0.54	0.46
query33	2.97	2.98	3.06
query34	15.87	5.14	4.46
query35	4.53	4.52	4.51
query36	0.68	0.50	0.48
query37	0.08	0.06	0.07
query38	0.05	0.03	0.04
query39	0.04	0.02	0.03
query40	0.16	0.14	0.13
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.02	0.02
Total cold run time: 104.84 s
Total hot run time: 31.05 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 284f3d1 into apache:master Mar 13, 2025
26 of 28 checks passed
bobhan1 added a commit to bobhan1/doris that referenced this pull request Mar 13, 2025
…n=true` (apache#48399)

apache#39558 add a config to support
shadow tablet to do cumulative compaction during schema change in cloud
mode to avoid -235 error on new tablet in the case of a large number of
loads. However, this introduces correctness problem on merge-on-write
table because some rowsets' delete bitmaps are wrong if there are cumu
compactions on new tablet when SC calculate delete bitmaps for
incremental rowsets after converting historical data.

This PR introduce a new type of compaction `STOP_TOKEN` to fail all
existing compaction jobs and disallow doing any compaction on tablet and
change the SC process as following:
1. converting historical data
2. register stop token on new tablet to fail all existing compaction
jobs and disallow doing any compaction on new tablet
3. calculate delete bitmap for incremental rowsets without lock
4. calculate delete bitmap for incremental rowsets with lock
5. commit SC job and remove stop token
----

ref: apache#29386
dataroaring pushed a commit that referenced this pull request Mar 14, 2025
dataroaring pushed a commit that referenced this pull request Mar 26, 2025
### What problem does this PR solve?

cloud heavy sc job will retry the whole alter tasks when encounter
`KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in
`commit_tablet_job`(#46748). We
should remove stop token(#48399) in
MS for the sc job if it fails in `commit_tablet_job`, otherwise the
later retries may fail to regsiter stop token(because the first stop
token won't expire in `config::lease_compaction_interval_seconds *
4=80s`) and the schema change job will fail.

```
I20250318 15:40:15.851157  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:40:31.346628  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:40:31.346635  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:40:31.350860  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:40:31.350906  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:40:31.350916  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:40:31.382493  6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289
I20250318 15:40:31.385416  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2
I20250318 15:40:31.387535  6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
I20250318 15:40:31.388285  6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829
I20250318 15:40:31.391326  6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829
I20250318 15:40:31.392035  6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
W20250318 15:40:39.947554  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 }
I20250318 15:40:46.204162  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:41:07.487172  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:41:07.487183  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:41:07.489440  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:41:07.489511  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:41:07.489523  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:41:07.490249  6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829
I20250318 15:41:07.490275  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2
W20250318 15:41:07.490864  6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
W20250318 15:41:07.490897  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
```
github-actions bot pushed a commit that referenced this pull request Mar 26, 2025
### What problem does this PR solve?

cloud heavy sc job will retry the whole alter tasks when encounter
`KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in
`commit_tablet_job`(#46748). We
should remove stop token(#48399) in
MS for the sc job if it fails in `commit_tablet_job`, otherwise the
later retries may fail to regsiter stop token(because the first stop
token won't expire in `config::lease_compaction_interval_seconds *
4=80s`) and the schema change job will fail.

```
I20250318 15:40:15.851157  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:40:31.346628  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:40:31.346635  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:40:31.350860  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:40:31.350906  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:40:31.350916  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:40:31.382493  6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289
I20250318 15:40:31.385416  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2
I20250318 15:40:31.387535  6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
I20250318 15:40:31.388285  6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829
I20250318 15:40:31.391326  6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829
I20250318 15:40:31.392035  6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
W20250318 15:40:39.947554  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 }
I20250318 15:40:46.204162  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:41:07.487172  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:41:07.487183  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:41:07.489440  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:41:07.489511  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:41:07.489523  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:41:07.490249  6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829
I20250318 15:41:07.490275  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2
W20250318 15:41:07.490864  6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
W20250318 15:41:07.490897  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
```
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…n=true` (apache#48399)

### What problem does this PR solve?

apache#39558 add a config to support
shadow tablet to do cumulative compaction during schema change in cloud
mode to avoid -235 error on new tablet in the case of a large number of
loads. However, this introduces correctness problem on merge-on-write
table because some rowsets' delete bitmaps are wrong if there are cumu
compactions on new tablet when SC calculate delete bitmaps for
incremental rowsets after converting historical data.

This PR introduce a new type of compaction `STOP_TOKEN` to fail all
existing compaction jobs and disallow doing any compaction on tablet and
change the SC process as following:
1. converting historical data
2. register stop token on new tablet to fail all existing compaction
jobs and disallow doing any compaction on new tablet
3. calculate delete bitmap for incremental rowsets without lock
4. calculate delete bitmap for incremental rowsets with lock
5. commit SC job and remove stop token
----

ref: apache#29386
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…he#49275)

### What problem does this PR solve?

cloud heavy sc job will retry the whole alter tasks when encounter
`KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES` error in
`commit_tablet_job`(apache#46748). We
should remove stop token(apache#48399) in
MS for the sc job if it fails in `commit_tablet_job`, otherwise the
later retries may fail to regsiter stop token(because the first stop
token won't expire in `config::lease_compaction_interval_seconds *
4=80s`) and the schema change job will fail.

```
I20250318 15:40:15.851157  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:40:31.346628  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:40:31.346635  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:40:31.350860  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:40:31.350906  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:40:31.350916  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:40:31.382493  6496 segment_creator.cpp:308] tablet_id:1742283174829, flushing rowset_dir: , rowset_id:020000000000038f6644fd8945079d22209de0cad6c7e5b8, data size:73808, index size:3289
I20250318 15:40:31.385416  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=3|alter_version=2
I20250318 15:40:31.387535  6496 cloud_storage_engine.cpp:894] successfully register compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
I20250318 15:40:31.388285  6496 cloud_schema_change_job.cpp:439] alter table for mow table, calculate delete bitmap of incremental rowsets without lock, version: 3-2 new_table_id: 1742283174829
I20250318 15:40:31.391326  6496 cloud_schema_change_job.cpp:460] alter table for mow table, calculate delete bitmap of incremental rowsets with lock, version: 3-2 new_tablet_id: 1742283174829
I20250318 15:40:31.392035  6496 cloud_storage_engine.cpp:915] successfully unregister compaction stop token for tablet_id=1742283174829, delete_bitmap_lock_initiator=6632031443518271970
W20250318 15:40:39.947554  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[DELETE_BITMAP_LOCK_ERROR]txn conflict when commit tablet job idx { table_id: 1742283165243 index_id: 1742283165244 partition_id: 1742283165242 tablet_id: 1742283170711 } schema_change { initiator: "172.20.56.12:9050" id: "1742283173457" new_tablet_idx { table_id: 1742283165243 index_id: 1742283173458 partition_id: 1742283165242 tablet_id: 1742283174829 } txn_ids: 610474243072 alter_version: 2 num_output_rowsets: 1 num_output_segments: 1 size_output_rowsets: 77097 num_output_rows: 611 output_versions: 2 output_cumulative_point: 2 delete_bitmap_lock_initiator: 6632031443518271970 index_size_output_rowsets: 3289 segment_size_output_rowsets: 73808 }
I20250318 15:40:46.204162  7677 task_worker_pool.cpp:423] successfully submit task|type=ALTER|signature=1742283174829
I20250318 15:41:07.487172  6496 task_worker_pool.cpp:1999] get alter table task, signature: 1742283174829
I20250318 15:41:07.487183  6496 task_worker_pool.cpp:281] start alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|mem_limit=10682972209
I20250318 15:41:07.489440  6496 cloud_schema_change_job.cpp:132] Begin to alter tablet. base_tablet_id=1742283170711, new_tablet_id=1742283174829, alter_version=2, job_id=1742283173457
I20250318 15:41:07.489511  6496 cloud_schema_change_job.cpp:226] Begin to convert historical rowsets for new_tablet from base_tablet. base_tablet=1742283170711, new_tablet=1742283174829, job_id=1742283173457
I20250318 15:41:07.489523  6496 cloud_schema_change_job.cpp:247] schema change type, sc_sorting: 0, sc_directly: 1, base_tablet=1742283170711, new_tablet=1742283174829
I20250318 15:41:07.490249  6496 cloud_schema_change_job.cpp:285] Rowset [2-2] has already existed in tablet 1742283174829
I20250318 15:41:07.490275  6496 cloud_schema_change_job.cpp:416] process mow table|new_tablet_id=1742283174829|out_rowset_size=1|start_calc_delete_bitmap_version=2|alter_version=2
W20250318 15:41:07.490864  6496 cloud_compaction_stop_token.cpp:89] failed to register compaction stop token|job_id=a018587a-c12f-4926-9d7e-514ff9d88457|delete_bitmap_lock_initiator=1847151139249560285|tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
W20250318 15:41:07.490897  6496 task_worker_pool.cpp:306] failed to alter tablet|signature=1742283174829|base_tablet_id=1742283170711|new_tablet_id=1742283174829|error=[INTERNAL_ERROR]failed to start tablet job: compactions are not allowed on tablet_id=1742283174829 currently, blocked by schema change job delete_bitmap_initiator=6632031443518271970
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.5-merged p0_w reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants