Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Mar 13, 2025

pick #48399

…n=true` (apache#48399)

apache#39558 add a config to support
shadow tablet to do cumulative compaction during schema change in cloud
mode to avoid -235 error on new tablet in the case of a large number of
loads. However, this introduces correctness problem on merge-on-write
table because some rowsets' delete bitmaps are wrong if there are cumu
compactions on new tablet when SC calculate delete bitmaps for
incremental rowsets after converting historical data.

This PR introduce a new type of compaction `STOP_TOKEN` to fail all
existing compaction jobs and disallow doing any compaction on tablet and
change the SC process as following:
1. converting historical data
2. register stop token on new tablet to fail all existing compaction
jobs and disallow doing any compaction on new tablet
3. calculate delete bitmap for incremental rowsets without lock
4. calculate delete bitmap for incremental rowsets with lock
5. commit SC job and remove stop token
----

ref: apache#29386
@bobhan1 bobhan1 requested a review from dataroaring as a code owner March 13, 2025 06:18
@Thearas
Copy link
Contributor

Thearas commented Mar 13, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 13, 2025

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 83.00% (1074/1294)
Line Coverage: 65.97% (17728/26874)
Region Coverage: 65.36% (8723/13347)
Branch Coverage: 55.32% (4704/8504)
Coverage Report: http://coverage.selectdb-in.cc/coverage/301e80d6aafb1584c8421ac9d6b47ded3cdea22c_301e80d6aafb1584c8421ac9d6b47ded3cdea22c_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 40043 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 301e80d6aafb1584c8421ac9d6b47ded3cdea22c, data reload: false

------ Round 1 ----------------------------------
q1	17731	6880	6649	6649
q2	2074	172	188	172
q3	10644	1096	1182	1096
q4	10473	748	709	709
q5	7744	2828	2889	2828
q6	223	136	137	136
q7	963	613	609	609
q8	9363	1970	2090	1970
q9	6664	6471	6422	6422
q10	6996	2234	2339	2234
q11	465	262	261	261
q12	399	214	210	210
q13	17754	2994	3043	2994
q14	250	208	210	208
q15	514	508	481	481
q16	662	599	585	585
q17	985	580	560	560
q18	7410	6545	6690	6545
q19	1394	1048	1032	1032
q20	488	208	201	201
q21	4127	3143	3241	3143
q22	1093	998	1005	998
Total cold run time: 108416 ms
Total hot run time: 40043 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6661	6569	6555	6555
q2	333	241	250	241
q3	2935	2766	2952	2766
q4	2043	1823	1813	1813
q5	5786	5810	5779	5779
q6	221	131	134	131
q7	2291	1838	1777	1777
q8	3413	3602	3550	3550
q9	8793	8883	8928	8883
q10	3587	3533	3516	3516
q11	593	501	493	493
q12	781	628	585	585
q13	10106	3227	3186	3186
q14	294	289	283	283
q15	520	470	458	458
q16	697	642	655	642
q17	1885	1636	1629	1629
q18	8303	7806	7778	7778
q19	1692	1594	1528	1528
q20	2033	1863	1867	1863
q21	5576	5349	5338	5338
q22	1150	1043	1047	1043
Total cold run time: 69693 ms
Total hot run time: 59837 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/231) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 38.90% (10166/26134)
Line Coverage 30.31% (86635/285833)
Region Coverage 29.35% (44484/151580)
Branch Coverage 25.87% (22629/87472)

@doris-robot
Copy link

TPC-DS: Total hot run time: 197575 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 301e80d6aafb1584c8421ac9d6b47ded3cdea22c, data reload: false

query1	1299	912	898	898
query2	6266	2109	2070	2070
query3	11009	4551	4664	4551
query4	61635	28703	23260	23260
query5	5157	454	446	446
query6	413	183	188	183
query7	5465	321	317	317
query8	310	224	228	224
query9	8883	2659	2660	2659
query10	479	270	269	269
query11	17607	15222	16069	15222
query12	163	111	104	104
query13	1440	464	444	444
query14	10983	7116	6839	6839
query15	205	194	189	189
query16	7213	517	500	500
query17	1186	602	611	602
query18	1842	341	318	318
query19	210	167	161	161
query20	118	117	112	112
query21	204	104	108	104
query22	4674	4749	4548	4548
query23	34368	34729	34206	34206
query24	6179	2951	2957	2951
query25	531	430	426	426
query26	670	183	186	183
query27	1800	356	360	356
query28	4314	2513	2451	2451
query29	663	448	455	448
query30	248	160	167	160
query31	998	826	840	826
query32	67	59	58	58
query33	414	285	282	282
query34	912	515	559	515
query35	869	724	735	724
query36	1110	983	1027	983
query37	116	72	67	67
query38	4124	4065	3993	3993
query39	1511	1474	1461	1461
query40	204	97	102	97
query41	47	55	49	49
query42	110	106	102	102
query43	553	517	503	503
query44	1184	824	840	824
query45	190	175	171	171
query46	1153	754	749	749
query47	2051	1928	1905	1905
query48	480	411	402	402
query49	729	424	388	388
query50	864	442	426	426
query51	7320	7354	7222	7222
query52	116	94	90	90
query53	262	185	193	185
query54	586	457	475	457
query55	80	82	74	74
query56	280	272	257	257
query57	1285	1149	1165	1149
query58	221	207	215	207
query59	3311	3126	2958	2958
query60	279	251	243	243
query61	108	111	120	111
query62	789	675	667	667
query63	216	190	186	186
query64	1391	670	678	670
query65	3300	3191	3233	3191
query66	690	289	294	289
query67	15868	15463	15496	15463
query68	4343	578	569	569
query69	438	265	265	265
query70	1199	1097	1097	1097
query71	357	266	254	254
query72	6096	4104	3966	3966
query73	747	346	373	346
query74	10104	9177	8942	8942
query75	3343	2685	2656	2656
query76	1902	1090	1053	1053
query77	545	298	279	279
query78	10613	9537	9657	9537
query79	1480	618	613	613
query80	869	435	422	422
query81	516	239	237	237
query82	1293	90	88	88
query83	243	139	141	139
query84	285	80	82	80
query85	896	317	292	292
query86	339	290	303	290
query87	4459	4228	4286	4228
query88	3684	2444	2390	2390
query89	423	297	293	293
query90	2010	188	193	188
query91	188	153	150	150
query92	66	51	54	51
query93	1795	580	559	559
query94	760	291	299	291
query95	368	256	258	256
query96	614	287	286	286
query97	3310	3132	3176	3132
query98	221	208	200	200
query99	1584	1281	1289	1281
Total cold run time: 314412 ms
Total hot run time: 197575 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.08 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 301e80d6aafb1584c8421ac9d6b47ded3cdea22c, data reload: false

query1	0.03	0.02	0.03
query2	0.09	0.05	0.04
query3	0.23	0.07	0.06
query4	1.63	0.09	0.09
query5	0.52	0.49	0.50
query6	1.14	0.74	0.74
query7	0.02	0.03	0.02
query8	0.05	0.05	0.06
query9	0.56	0.50	0.50
query10	0.57	0.55	0.57
query11	0.16	0.12	0.12
query12	0.16	0.12	0.13
query13	0.61	0.60	0.60
query14	2.79	2.74	2.94
query15	0.90	0.84	0.83
query16	0.37	0.38	0.38
query17	1.04	1.00	1.07
query18	0.20	0.20	0.19
query19	1.93	1.89	2.08
query20	0.02	0.02	0.01
query21	15.35	0.65	0.65
query22	4.20	6.62	1.83
query23	18.24	1.41	1.43
query24	2.22	0.21	0.24
query25	0.15	0.09	0.08
query26	0.28	0.17	0.17
query27	0.08	0.08	0.07
query28	13.24	0.61	0.58
query29	12.65	3.35	3.36
query30	0.25	0.06	0.06
query31	2.83	0.41	0.40
query32	3.22	0.49	0.49
query33	3.03	3.06	3.03
query34	17.26	4.58	4.59
query35	4.60	4.61	4.63
query36	0.67	0.48	0.50
query37	0.20	0.16	0.16
query38	0.16	0.16	0.16
query39	0.06	0.04	0.04
query40	0.17	0.14	0.14
query41	0.11	0.06	0.05
query42	0.06	0.05	0.06
query43	0.05	0.04	0.05
Total cold run time: 112.1 s
Total hot run time: 33.08 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 14, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 84f61c9 into apache:branch-3.0 Mar 14, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants