[Fix](partial update) Fix rowset not found error when doing partial update #34112

Yukang-Lian · 2024-04-25T09:12:38Z

Proposed changes

Issue Number: close #xxx

Issue: When users execute the update statement, they encounter a logic for partial column updates. For large datasets, during the update process, the error "the unmentioned column xxx should have default value or be nullable for newly inserted rows in non-strict mode partial update" may appear.

Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown.

Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

doris-robot · 2024-04-25T09:12:42Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Yukang-Lian · 2024-04-25T09:13:04Z

test will be add soon.

github-actions · 2024-04-25T09:18:17Z

clang-tidy review says "All clean, LGTM! 👍"

Yukang-Lian · 2024-04-25T09:24:20Z

be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp


    std::vector<RowsetSharedPtr> specified_rowsets;
    {
+        sleep(120);


This is just a draft. This sleep 120 will be added in debug point test.

github-actions · 2024-04-25T15:10:55Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2024-04-25T16:33:19Z

clang-tidy review says "All clean, LGTM! 👍"

Yukang-Lian · 2024-04-25T16:56:20Z

run buildall

github-actions · 2024-04-25T17:02:02Z

clang-tidy review says "All clean, LGTM! 👍"

doris-robot · 2024-04-25T18:31:08Z

TeamCity be ut coverage result:
Function Coverage: 35.19% (8917/25341)
Line Coverage: 26.97% (73318/271856)
Region Coverage: 26.15% (37883/144883)
Branch Coverage: 22.96% (19285/83980)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f8b8d63ca10d91c7a0589d7d370e6c64b8715b23_f8b8d63ca10d91c7a0589d7d370e6c64b8715b23/report/index.html

Yukang-Lian · 2024-04-26T02:46:14Z

run buildall

github-actions · 2024-04-26T02:49:35Z

clang-tidy review says "All clean, LGTM! 👍"

doris-robot · 2024-04-26T03:35:49Z

TPC-H: Total hot run time: 40140 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 60cdbb6b046dd8936a01a4971f85b88a47d1cff3, data reload: false

------ Round 1 ----------------------------------
q1	17677	4715	4497	4497
q2	2004	183	176	176
q3	10561	1144	1172	1144
q4	10226	841	791	791
q5	7501	2715	2659	2659
q6	219	130	128	128
q7	1015	614	579	579
q8	9229	2092	2061	2061
q9	8884	6668	6602	6602
q10	8873	3754	3730	3730
q11	447	231	232	231
q12	445	221	221	221
q13	17839	2942	2951	2942
q14	287	230	226	226
q15	534	483	477	477
q16	541	386	384	384
q17	975	657	651	651
q18	7998	7358	7457	7358
q19	6728	1584	1510	1510
q20	648	305	300	300
q21	4917	3204	4022	3204
q22	331	272	269	269
Total cold run time: 117879 ms
Total hot run time: 40140 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4387	4189	4214	4189
q2	373	261	273	261
q3	2991	2727	2776	2727
q4	1874	1554	1606	1554
q5	5347	5347	5306	5306
q6	209	123	124	123
q7	2255	1868	1873	1868
q8	3228	3360	3381	3360
q9	8554	8569	8595	8569
q10	3874	3693	3699	3693
q11	579	477	473	473
q12	773	596	609	596
q13	16276	2962	2954	2954
q14	296	284	267	267
q15	516	480	485	480
q16	463	417	412	412
q17	1833	1539	1508	1508
q18	8097	7967	7999	7967
q19	1691	1591	1593	1591
q20	2119	1867	1821	1821
q21	11095	5035	5010	5010
q22	586	497	479	479
Total cold run time: 77416 ms
Total hot run time: 55208 ms

doris-robot · 2024-04-26T04:05:00Z

TeamCity be ut coverage result:
Function Coverage: 35.19% (8919/25342)
Line Coverage: 26.98% (73337/271865)
Region Coverage: 26.16% (37900/144887)
Branch Coverage: 22.97% (19290/83980)
Coverage Report: http://coverage.selectdb-in.cc/coverage/60cdbb6b046dd8936a01a4971f85b88a47d1cff3_60cdbb6b046dd8936a01a4971f85b88a47d1cff3/report/index.html

…c for partial column updates. For large datasets, during the update process, the error "the unmentioned column xxx should have default value or be nullable for newly inserted rows in non-strict mode partial update" may appear. Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown. Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.

github-actions · 2024-04-26T14:01:21Z

clang-tidy review says "All clean, LGTM! 👍"

Yukang-Lian · 2024-04-26T14:12:51Z

run buildall

github-actions · 2024-04-26T14:18:24Z

clang-tidy review says "All clean, LGTM! 👍"

doris-robot · 2024-04-26T15:21:49Z

TPC-H: Total hot run time: 41511 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4baffb0542a62fc84b7550cce0761729507177e9, data reload: false

------ Round 1 ----------------------------------
q1	17594	4566	4251	4251
q2	2020	186	197	186
q3	10528	1195	1191	1191
q4	10213	823	776	776
q5	7535	2761	2811	2761
q6	221	135	134	134
q7	1056	618	618	618
q8	9229	2189	2138	2138
q9	9395	6957	6928	6928
q10	9145	3917	3903	3903
q11	435	240	242	240
q12	474	237	229	229
q13	17338	3139	3187	3139
q14	258	229	232	229
q15	512	473	475	473
q16	506	416	402	402
q17	981	820	716	716
q18	8452	7906	7819	7819
q19	4727	1593	1531	1531
q20	659	313	319	313
q21	5286	3264	4259	3264
q22	342	270	280	270
Total cold run time: 116906 ms
Total hot run time: 41511 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4516	4470	4422	4422
q2	368	274	277	274
q3	3208	2920	2986	2920
q4	1825	1568	1555	1555
q5	5526	5530	5542	5530
q6	223	124	124	124
q7	2352	2008	1976	1976
q8	3298	3467	3403	3403
q9	8926	8898	8990	8898
q10	4014	3738	3882	3738
q11	604	497	487	487
q12	788	613	628	613
q13	16244	3178	3116	3116
q14	318	286	293	286
q15	532	484	494	484
q16	502	436	441	436
q17	1810	1536	1489	1489
q18	7747	7706	7430	7430
q19	1644	1608	1541	1541
q20	1968	1806	1749	1749
q21	11112	4722	4691	4691
q22	568	478	478	478
Total cold run time: 78093 ms
Total hot run time: 55640 ms

doris-robot · 2024-04-26T15:33:09Z

TPC-DS: Total hot run time: 185364 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4baffb0542a62fc84b7550cce0761729507177e9, data reload: false

query1	904	361	345	345
query2	6430	2391	2370	2370
query3	6644	207	207	207
query4	23134	21122	21152	21122
query5	4147	405	410	405
query6	253	185	177	177
query7	4589	287	308	287
query8	243	182	184	182
query9	8633	2304	2277	2277
query10	423	251	240	240
query11	14765	14203	14221	14203
query12	135	95	84	84
query13	1629	372	382	372
query14	9787	7516	7669	7516
query15	214	171	173	171
query16	7692	255	254	254
query17	1317	563	530	530
query18	1945	273	270	270
query19	211	147	145	145
query20	90	84	83	83
query21	190	128	127	127
query22	4965	4839	4753	4753
query23	33811	32932	33103	32932
query24	6329	2921	3013	2921
query25	466	374	370	370
query26	697	146	145	145
query27	1866	309	331	309
query28	3601	2015	2014	2014
query29	841	621	601	601
query30	232	149	150	149
query31	940	737	711	711
query32	57	51	52	51
query33	478	241	236	236
query34	866	481	467	467
query35	756	670	681	670
query36	1073	929	850	850
query37	99	69	64	64
query38	3175	3026	3009	3009
query39	1586	1540	1535	1535
query40	207	124	124	124
query41	42	39	40	39
query42	104	95	96	95
query43	552	528	512	512
query44	1040	734	745	734
query45	296	266	268	266
query46	1069	690	728	690
query47	1939	1846	1866	1846
query48	382	288	290	288
query49	749	392	393	392
query50	766	384	373	373
query51	6718	6633	6569	6569
query52	109	89	86	86
query53	347	280	272	272
query54	258	237	225	225
query55	75	70	69	69
query56	244	221	224	221
query57	1198	1126	1119	1119
query58	210	197	193	193
query59	3358	3400	3099	3099
query60	251	230	238	230
query61	121	91	91	91
query62	535	446	449	446
query63	313	282	280	280
query64	7496	7183	7140	7140
query65	3117	3038	3036	3036
query66	790	330	339	330
query67	15247	14884	15355	14884
query68	5294	536	547	536
query69	482	308	305	305
query70	1132	1069	1160	1069
query71	408	278	295	278
query72	7159	2660	2426	2426
query73	703	320	323	320
query74	6438	6196	6051	6051
query75	3361	2642	2712	2642
query76	2870	1014	1015	1014
query77	395	261	259	259
query78	10989	10327	10227	10227
query79	2573	520	515	515
query80	1905	439	425	425
query81	504	218	219	218
query82	729	96	97	96
query83	271	168	171	168
query84	269	82	88	82
query85	1690	277	266	266
query86	519	315	302	302
query87	3245	3090	3078	3078
query88	4301	2334	2346	2334
query89	479	378	376	376
query90	2033	181	180	180
query91	125	96	103	96
query92	62	46	52	46
query93	4479	511	508	508
query94	1230	184	177	177
query95	395	294	290	290
query96	596	260	259	259
query97	3122	2934	2950	2934
query98	233	220	215	215
query99	1180	883	869	869
Total cold run time: 271917 ms
Total hot run time: 185364 ms

doris-robot · 2024-04-26T15:56:59Z

TeamCity be ut coverage result:
Function Coverage: 35.51% (8922/25122)
Line Coverage: 27.13% (73374/270466)
Region Coverage: 26.31% (37907/144054)
Branch Coverage: 23.10% (19294/83536)
Coverage Report: http://coverage.selectdb-in.cc/coverage/4baffb0542a62fc84b7550cce0761729507177e9_4baffb0542a62fc84b7550cce0761729507177e9/report/index.html

gavinchou

LGTM

github-actions · 2024-04-26T16:30:38Z

PR approved by anyone and no changes requested.

zhannngchen · 2024-04-28T06:45:16Z

be/src/olap/rowset_builder.cpp

        _rowset_ids.clear();
    } else {
        RETURN_IF_ERROR(tablet()->get_all_rs_id_unlocked(cur_max_version, &_rowset_ids));
+        rowset_ptrs = tablet()->get_rowset_by_ids(&_rowset_ids);


DCHECK_EQ(_rowset_ids.size(), rowset_ptrs.size());

zhannngchen

LGTM

github-actions · 2024-04-28T06:53:20Z

PR approved by at least one committer and no changes requested.

…pdate (apache#34112) Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown. Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.

* [Enhancement](full compaction) Add run status support for full compaction (#34043) * The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}` e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084` If full compaction is running, the output will be ``` { "status" : "Success", "run_status" : true, "msg" : "compaction task for this tablet is running", "tablet_id" : 10084, "compact_type" : "full" } ``` else the ouput will be ``` { "status" : "Success", "run_status" : false, "msg" : "compaction task for this tablet is not running", "tablet_id" : 10084, "compact_type" : "full" } ``` * 2 * 2 * [Fix](partial update) Fix rowset not found error when doing partial update (#34112) Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown. Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.

…pdate (apache#34112) Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown. Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.

…pdate #34112 (#34357)

…pdate apache#34112 (apache#34357)

…for partial update (#40062) ## Proposed changes 1. #34112 let partial update fetch rowsets in the initialization of RowsetBuilder rather than flush phase. So we can remove that tablet header lock. 2. refactor some partial update code

…code for partial update (apache#40062) 1. apache#34112 let partial update fetch rowsets in the initialization of RowsetBuilder rather than flush phase. So we can remove that tablet header lock. 2. refactor some partial update code

…update apache#39619 pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062) 1. apache#34112 let partial update fetch rowsets in the initialization of RowsetBuilder rather than flush phase. So we can remove that tablet header lock. 2. refactor some partial update code fix compile pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272 picks apache#40272 pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487) Pick apache#40364  pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756) This PR add the ability to update different columns for each row in one stream load Doc: apache/doris-website#1140 ```sql MySQL root@127.1:d1> CREATE TABLE t1 ( -> `k` int(11) NULL, -> `v1` BIGINT NULL, -> `v2` BIGINT NULL DEFAULT "9876", -> `v3` BIGINT NOT NULL, -> `v4` BIGINT NOT NULL DEFAULT "1234", -> `v5` BIGINT NULL -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -> PROPERTIES( -> "replication_num" = "1", -> "enable_unique_key_merge_on_write" = "true"); Query OK, 0 rows affected Time: 0.013s MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6"); Query OK, 6 rows affected Time: 0.107s MySQL root@127.1:d1> select * from t1; +---+----+----+----+----+----+ | k | v1 | v2 | v3 | v4 | v5 | +---+----+----+----+----+----+ | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 1 | 1 | 1 | 1 | 1 | | 2 | 2 | 2 | 2 | 2 | 2 | | 3 | 3 | 3 | 3 | 3 | 3 | | 4 | 4 | 4 | 4 | 4 | 4 | | 5 | 5 | 5 | 5 | 5 | 5 | +---+----+----+----+----+----+ ``` test1.json: ```json {"k": 1, "v1": 10} {"k": 2, "v2": 20, "v5": 25} {"k": 3, "v3": 30} {"k": 4, "v4": 20, "v1": 43, "v3": 99} {"k": 5, "v5": null} {"k": 6, "v1": 999, "v3": 777} {"k": 2, "v4": 222} {"k": 1, "v2": 111, "v3": 111} ``` ```bash curl --location-trusted -u root: \ -H "strict_mode:false" \ -H "format:json" \ -H "read_json_by_line:true" \ -H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \ -T test1.json \ -XPUT http://<host>:<http_port>/api/d1/t1/_stream_load ``` ```sql MySQL root@127.1:d1> select * from t1; +---+-----+------+-----+------+--------+ | k | v1 | v2 | v3 | v4 | v5 | +---+-----+------+-----+------+--------+ | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 10 | 111 | 111 | 1 | 1 | | 2 | 2 | 20 | 2 | 222 | 25 | | 3 | 3 | 3 | 30 | 3 | 3 | | 4 | 43 | 4 | 99 | 20 | 4 | | 5 | 5 | 5 | 5 | 5 | <null> | | 6 | 999 | 9876 | 777 | 1234 | <null> | +---+-----+------+-----+------+--------+ ``` fix compile pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863) picks apache#40736 fix

Yukang-Lian marked this pull request as draft April 25, 2024 09:12

Yukang-Lian commented Apr 25, 2024

View reviewed changes

Yukang-Lian marked this pull request as ready for review April 25, 2024 16:55

Yukang-Lian force-pushed the Fix_Rowset_Not_Fount_When_Partial_Update branch from 5353033 to 4baffb0 Compare April 26, 2024 14:12

gavinchou approved these changes Apr 26, 2024

View reviewed changes

github-actions bot added the reviewed label Apr 26, 2024

zhannngchen reviewed Apr 28, 2024

View reviewed changes

zhannngchen approved these changes Apr 28, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 28, 2024

zhannngchen merged commit dcc7c07 into apache:master Apr 28, 2024

zhannngchen added dev/2.1.x dev/2.0.x p0_w labels Apr 28, 2024

yiguolei mentioned this pull request Apr 29, 2024

[Cherry-pick](branch-2.1) Pick #34043 and #34112 #34318

Merged

yiguolei added dev/2.1.3-merged and removed dev/2.1.x labels Apr 29, 2024

xiaokang mentioned this pull request Apr 30, 2024

[Fix](partial update) Fix rowset not found error when doing partial update #34112 #34357

Merged

xiaokang pushed a commit that referenced this pull request Apr 30, 2024

[Fix](partial update) Fix rowset not found error when doing partial u…

74ed6e8

…pdate #34112 (#34357)

xiaokang added dev/2.0.10-merged and removed dev/2.0.x labels Apr 30, 2024

mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024

[Fix](partial update) Fix rowset not found error when doing partial u…

5e5a077

…pdate apache#34112 (apache#34357)

bobhan1 mentioned this pull request Aug 28, 2024

[opt](partial update) Remove unnecessary lock and refactor some code for partial update #40062

Merged

[Fix](partial update) Fix rowset not found error when doing partial update #34112

[Fix](partial update) Fix rowset not found error when doing partial update #34112

Uh oh!

Conversation

Yukang-Lian commented Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Further comments

Uh oh!

doris-robot commented Apr 25, 2024

Uh oh!

Yukang-Lian commented Apr 25, 2024

Uh oh!

github-actions bot commented Apr 25, 2024

Uh oh!

Yukang-Lian Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 25, 2024

Uh oh!

github-actions bot commented Apr 25, 2024

Uh oh!

Yukang-Lian commented Apr 25, 2024

Uh oh!

github-actions bot commented Apr 25, 2024

Uh oh!

doris-robot commented Apr 25, 2024

Uh oh!

Yukang-Lian commented Apr 26, 2024

Uh oh!

github-actions bot commented Apr 26, 2024

Uh oh!

doris-robot commented Apr 26, 2024

Uh oh!

doris-robot commented Apr 26, 2024

Uh oh!

github-actions bot commented Apr 26, 2024

Uh oh!

Yukang-Lian commented Apr 26, 2024

Uh oh!

github-actions bot commented Apr 26, 2024

Uh oh!

doris-robot commented Apr 26, 2024

Uh oh!

doris-robot commented Apr 26, 2024

Uh oh!

doris-robot commented Apr 26, 2024

Uh oh!

gavinchou left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 26, 2024

Uh oh!

zhannngchen Apr 28, 2024

Choose a reason for hiding this comment

Uh oh!

zhannngchen left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Yukang-Lian commented Apr 25, 2024 •

edited

Loading