Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Sep 24, 2024

Currently, Doris use strict mode to decide if newly inserted rows should be appended or report an error in partial update, which is hard to use. This PR add a new session variable and load property partial_update_new_key_behavior to control the behavior of newly inserted rows in partial update.
partial_update_new_key_behavior has three two options:

  • APPEND: append the newly inserted rows
  • IGNORE: delete the newly inserted rows silently(will not be taken into filtered rows)
  • ERROR: report error if meet newly inserted rows, and the error msg will contains one row's keys which is not in table.

The reason for not supporting IGNORE mode: To support IGNORE mode, we need to add delete sign for newly inserted rows in partial update to delete them rather than use delete bitmap mark to delete them because compaction will not use delete bitmap when reading data. Also, we need to record the rows whose delete sign is added by us in this situation for resolving conflicts in publish phase to avoid wrongly delete the rows if there are another concurrent load insert some of these rows successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 24, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.30% (9620/25791)
Line Coverage: 28.70% (79589/277339)
Region Coverage: 28.16% (41174/146218)
Branch Coverage: 24.78% (20979/84646)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1ad54ae69f00692de6dedfef0b1839362970d19c_1ad54ae69f00692de6dedfef0b1839362970d19c/report/index.html

@bobhan1 bobhan1 force-pushed the new-row-handling-mode branch from 6852060 to d50a073 Compare September 25, 2024 07:36
@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 25, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.29% (9625/25812)
Line Coverage: 28.70% (79677/277572)
Region Coverage: 28.12% (41182/146456)
Branch Coverage: 24.75% (20982/84760)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d50a073f97fceab838be7eac6146a5945f3caa97_d50a073f97fceab838be7eac6146a5945f3caa97/report/index.html

@bobhan1 bobhan1 force-pushed the new-row-handling-mode branch 2 times, most recently from 3655b81 to 15f65eb Compare September 27, 2024 11:28
@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 27, 2024

run buildall

@bobhan1 bobhan1 requested a review from zhannngchen September 27, 2024 11:41
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.30% (9628/25813)
Line Coverage: 28.71% (79705/277649)
Region Coverage: 28.13% (41194/146462)
Branch Coverage: 24.74% (20977/84778)
Coverage Report: http://coverage.selectdb-in.cc/coverage/15f65eb86991166897303d8d9ef5aac41edafd4b_15f65eb86991166897303d8d9ef5aac41edafd4b/report/index.html

zhannngchen
zhannngchen previously approved these changes Sep 29, 2024
Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Sep 29, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@bobhan1 bobhan1 force-pushed the new-row-handling-mode branch from 15f65eb to 6cbf1b1 Compare November 25, 2024 07:31
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 25, 2024
@bobhan1 bobhan1 force-pushed the new-row-handling-mode branch from 6cbf1b1 to e346a8a Compare November 25, 2024 07:32
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 force-pushed the new-row-handling-mode branch 2 times, most recently from 02a64a4 to 9e6f4ad Compare November 25, 2024 08:01
@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 25, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.33% (9979/26033)
Line Coverage: 29.42% (83505/283828)
Region Coverage: 28.59% (42975/150324)
Branch Coverage: 25.17% (21837/86744)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c212809e0778906333f3b9f9954a1d937d04cbad_c212809e0778906333f3b9f9954a1d937d04cbad/report/index.html

@bobhan1 bobhan1 force-pushed the new-row-handling-mode branch from c212809 to b7af246 Compare November 26, 2024 11:13
@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 26, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.36% (9978/26014)
Line Coverage: 29.45% (83520/283569)
Region Coverage: 28.59% (42974/150301)
Branch Coverage: 25.19% (21828/86644)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b7af2461a837caad3ec7b20463764f0e2b88a790_b7af2461a837caad3ec7b20463764f0e2b88a790/report/index.html

@bobhan1 bobhan1 force-pushed the new-row-handling-mode branch from 3b70051 to b4dd19a Compare November 27, 2024 10:48
@bobhan1
Copy link
Contributor Author

bobhan1 commented Jun 18, 2025

run p0

@bobhan1
Copy link
Contributor Author

bobhan1 commented Jun 18, 2025

run performance

@bobhan1
Copy link
Contributor Author

bobhan1 commented Jun 18, 2025

run external

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 18, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 33894 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 719746f67c9b70df82d495ff3cf0f02cb459d208, data reload: false

------ Round 1 ----------------------------------
q1	17665	5155	5061	5061
q2	1981	307	174	174
q3	10463	1214	751	751
q4	10214	1021	524	524
q5	7600	2383	2319	2319
q6	180	163	132	132
q7	908	739	610	610
q8	9327	1295	1081	1081
q9	6705	5123	5085	5085
q10	6898	2389	1947	1947
q11	500	285	271	271
q12	347	352	211	211
q13	17773	3679	3030	3030
q14	229	222	217	217
q15	568	480	487	480
q16	433	433	369	369
q17	630	853	366	366
q18	7752	7175	7146	7146
q19	1835	966	577	577
q20	333	344	226	226
q21	3652	3190	2370	2370
q22	1030	1008	947	947
Total cold run time: 107023 ms
Total hot run time: 33894 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5125	5052	5053	5052
q2	264	313	217	217
q3	2125	2649	2326	2326
q4	1367	1795	1344	1344
q5	4198	4104	4374	4104
q6	210	169	130	130
q7	2025	1922	1752	1752
q8	2586	2565	2564	2564
q9	7201	7183	7235	7183
q10	3109	3291	2780	2780
q11	610	530	529	529
q12	697	768	625	625
q13	3482	3873	3244	3244
q14	280	296	281	281
q15	552	495	477	477
q16	458	477	451	451
q17	1149	1598	1365	1365
q18	7904	7537	7598	7537
q19	823	815	876	815
q20	1982	2065	1892	1892
q21	4975	4216	4314	4216
q22	1063	1055	1043	1043
Total cold run time: 52185 ms
Total hot run time: 49927 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192737 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 719746f67c9b70df82d495ff3cf0f02cb459d208, data reload: false

query1	1435	1033	981	981
query2	6137	1766	1777	1766
query3	11022	4506	4422	4422
query4	56462	25066	23113	23113
query5	5300	481	453	453
query6	418	215	203	203
query7	5401	522	292	292
query8	324	217	212	212
query9	7410	2666	2673	2666
query10	454	324	262	262
query11	15185	15245	14797	14797
query12	155	103	105	103
query13	1312	499	410	410
query14	10117	6269	6180	6180
query15	203	203	187	187
query16	7045	649	472	472
query17	1080	743	637	637
query18	1559	429	351	351
query19	207	202	180	180
query20	126	127	121	121
query21	212	137	110	110
query22	4507	4782	4547	4547
query23	34354	33711	33599	33599
query24	6615	2466	2418	2418
query25	512	496	438	438
query26	673	284	215	215
query27	2158	516	348	348
query28	3012	2204	2187	2187
query29	598	559	489	489
query30	275	219	198	198
query31	855	869	784	784
query32	76	59	66	59
query33	452	369	301	301
query34	786	875	528	528
query35	803	827	737	737
query36	913	985	894	894
query37	112	104	77	77
query38	4257	4217	4282	4217
query39	1675	1479	1429	1429
query40	219	122	106	106
query41	61	62	58	58
query42	131	111	112	111
query43	495	496	485	485
query44	1406	848	845	845
query45	179	175	172	172
query46	861	1028	659	659
query47	1879	1892	1860	1860
query48	411	438	325	325
query49	690	477	404	404
query50	663	712	413	413
query51	4306	4248	4144	4144
query52	115	107	104	104
query53	231	262	189	189
query54	618	576	509	509
query55	89	83	80	80
query56	311	311	288	288
query57	1234	1268	1184	1184
query58	272	281	255	255
query59	2692	2762	2665	2665
query60	337	315	311	311
query61	133	156	141	141
query62	724	794	718	718
query63	226	187	187	187
query64	1431	1037	688	688
query65	4208	4171	4147	4147
query66	720	390	316	316
query67	16197	15870	15443	15443
query68	7453	896	517	517
query69	552	296	258	258
query70	1121	1095	1088	1088
query71	501	342	297	297
query72	5973	4793	4875	4793
query73	1104	682	351	351
query74	9262	9163	8895	8895
query75	3736	3207	2688	2688
query76	4154	1198	733	733
query77	622	350	286	286
query78	10083	10226	9520	9520
query79	2869	841	581	581
query80	643	514	435	435
query81	486	256	222	222
query82	478	128	99	99
query83	374	244	261	244
query84	294	105	83	83
query85	809	373	316	316
query86	399	294	274	274
query87	4389	4436	4307	4307
query88	3240	2286	2299	2286
query89	400	320	277	277
query90	1954	204	206	204
query91	147	142	124	124
query92	72	60	58	58
query93	1665	944	586	586
query94	663	404	314	314
query95	365	291	289	289
query96	496	568	283	283
query97	2706	2762	2670	2670
query98	224	211	199	199
query99	1426	1423	1270	1270
Total cold run time: 303696 ms
Total hot run time: 192737 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.03 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 719746f67c9b70df82d495ff3cf0f02cb459d208, data reload: false

query1	0.04	0.03	0.03
query2	0.10	0.05	0.05
query3	0.28	0.06	0.06
query4	1.59	0.06	0.08
query5	0.43	0.42	0.41
query6	1.16	0.66	0.66
query7	0.02	0.02	0.01
query8	0.06	0.05	0.06
query9	0.63	0.52	0.52
query10	0.57	0.57	0.57
query11	0.25	0.12	0.13
query12	0.25	0.13	0.14
query13	0.64	0.63	0.64
query14	0.81	0.84	0.83
query15	0.98	0.91	0.89
query16	0.38	0.37	0.38
query17	1.10	1.07	1.08
query18	0.18	0.17	0.18
query19	1.99	1.85	1.90
query20	0.02	0.01	0.01
query21	15.42	0.98	0.68
query22	0.95	1.01	0.81
query23	14.72	1.52	0.75
query24	5.47	0.57	0.29
query25	0.16	0.09	0.08
query26	0.56	0.23	0.19
query27	0.08	0.08	0.08
query28	10.98	1.27	0.58
query29	12.53	4.09	3.40
query30	0.29	0.09	0.06
query31	2.85	0.63	0.43
query32	3.26	0.61	0.50
query33	3.10	3.05	3.12
query34	16.74	5.45	4.69
query35	4.84	4.75	4.75
query36	0.64	0.51	0.50
query37	0.20	0.18	0.18
query38	0.17	0.16	0.16
query39	0.05	0.04	0.04
query40	0.20	0.17	0.17
query41	0.11	0.06	0.05
query42	0.07	0.06	0.05
query43	0.06	0.05	0.05
Total cold run time: 104.93 s
Total hot run time: 30.03 s

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 8.77% (10/114) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 60.98% (16034/26293)
Line Coverage 50.48% (150582/298286)
Region Coverage 47.82% (86063/179968)
Branch Coverage 41.33% (42261/102264)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 8.77% (10/114) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 61.83% (16258/26293)
Line Coverage 51.28% (152965/298286)
Region Coverage 48.79% (87811/179968)
Branch Coverage 42.26% (43214/102264)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 6f00a7c into apache:master Jun 25, 2025
24 of 27 checks passed
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 9, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 9, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 9, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
morrySnow pushed a commit that referenced this pull request Jul 10, 2025
…e behavior of newly inserted rows in partial update #41950 #41232 (#52998)

pick #41950 and #41232
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants