Skip to content

Conversation

@sjyango
Copy link
Contributor

@sjyango sjyango commented Dec 26, 2023

Proposed changes

Transforming from parsing std:: string to parsing char * to accelerate the parsing of ipv4/v6 data types.
Comparison of import speed improvement for stream load:

Data level Previous(s) Current(s)
1w 0.078 0.058
10w 0.183 0.153
100w 1.260 0.594
1000w 12.146 3.866

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run buildall

@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run arm

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.61% (8562/23385)
Line Coverage: 28.68% (69636/242765)
Region Coverage: 27.70% (36029/130080)
Branch Coverage: 24.43% (18422/75406)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e92d02b8b7260f79e6cdf8fe20dbce50c52ed00f_e92d02b8b7260f79e6cdf8fe20dbce50c52ed00f/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.02 seconds
stream load tsv: 565 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 38.4 seconds inserted 10000000 Rows, about 260K ops/s
storage size: 17183762016 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit e92d02b8b7260f79e6cdf8fe20dbce50c52ed00f, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4796	4501	4468	4468
q2	373	178	161	161
q3	1474	1330	1222	1222
q4	1157	1010	930	930
q5	3178	3167	3155	3155
q6	255	137	132	132
q7	970	492	487	487
q8	2262	2269	2255	2255
q9	6703	6676	6713	6676
q10	3236	3273	3312	3273
q11	336	222	213	213
q12	357	205	211	205
q13	4567	3812	3797	3797
q14	242	214	216	214
q15	570	534	528	528
q16	447	380	388	380
q17	1044	765	648	648
q18	7156	6781	6807	6781
q19	1570	1593	1602	1593
q20	577	330	297	297
q21	3276	2800	2723	2723
q22	367	306	307	306
Total cold run time: 44913 ms
Total hot run time: 40444 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4415	4416	4382	4382
q2	274	166	170	166
q3	3512	3501	3489	3489
q4	2424	2406	2412	2406
q5	5725	5739	5726	5726
q6	249	126	127	126
q7	2368	1852	1896	1852
q8	3602	3612	3614	3612
q9	8987	8953	8955	8953
q10	3940	3993	4022	3993
q11	481	366	387	366
q12	769	606	595	595
q13	4269	3557	3580	3557
q14	282	251	252	251
q15	579	519	514	514
q16	493	455	472	455
q17	1963	1934	1945	1934
q18	8682	8159	8311	8159
q19	1826	1810	1818	1810
q20	2257	1934	1937	1934
q21	6614	6247	6229	6229
q22	559	457	468	457
Total cold run time: 64270 ms
Total hot run time: 60966 ms

@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run p0

@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.62% (8564/23386)
Line Coverage: 28.69% (69645/242773)
Region Coverage: 27.71% (36040/130083)
Branch Coverage: 24.44% (18427/75406)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3c79573f737f57eb1e429247dce2e70dbd3e0b2f_3c79573f737f57eb1e429247dce2e70dbd3e0b2f/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 3c79573f737f57eb1e429247dce2e70dbd3e0b2f, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4704	4465	4446	4446
q2	395	160	161	160
q3	1472	1322	1233	1233
q4	1151	989	899	899
q5	3155	3159	3149	3149
q6	265	135	135	135
q7	1054	484	490	484
q8	2251	2276	2254	2254
q9	6719	6668	6699	6668
q10	3236	3291	3282	3282
q11	324	207	207	207
q12	362	205	211	205
q13	4569	3844	3800	3800
q14	241	214	214	214
q15	568	520	530	520
q16	434	385	384	384
q17	1042	787	554	554
q18	7030	6884	6874	6874
q19	1572	1582	1617	1582
q20	532	304	297	297
q21	3250	2794	2754	2754
q22	375	302	297	297
Total cold run time: 44701 ms
Total hot run time: 40398 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4405	4389	4350	4350
q2	268	166	161	161
q3	3502	3500	3472	3472
q4	2428	2403	2410	2403
q5	5700	5736	5703	5703
q6	251	127	127	127
q7	2402	1885	1862	1862
q8	3600	3605	3590	3590
q9	9032	8980	8942	8942
q10	3900	3985	3994	3985
q11	486	374	366	366
q12	768	664	603	603
q13	4271	3542	3560	3542
q14	293	257	252	252
q15	572	522	520	520
q16	496	472	449	449
q17	1954	1956	1937	1937
q18	8668	8091	8186	8091
q19	1810	1821	1797	1797
q20	2253	1957	1926	1926
q21	6588	6213	6224	6213
q22	543	468	466	466
Total cold run time: 64190 ms
Total hot run time: 60757 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.06 seconds
stream load tsv: 564 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 37.2 seconds inserted 10000000 Rows, about 268K ops/s
storage size: 17183893842 Bytes

@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run PipelineX

@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run pipelinex_p0

1 similar comment
@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run pipelinex_p0

@sjyango sjyango force-pushed the ip-parser branch 2 times, most recently from db55e88 to 941d67d Compare December 26, 2023 09:36
@sjyango
Copy link
Contributor Author

sjyango commented Dec 26, 2023

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.60% (8563/23396)
Line Coverage: 28.67% (69656/242986)
Region Coverage: 27.69% (36042/130181)
Branch Coverage: 24.42% (18429/75474)
Coverage Report: http://coverage.selectdb-in.cc/coverage/941d67dcd87def4d6ef08bf502444482b244499a_941d67dcd87def4d6ef08bf502444482b244499a/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 941d67dcd87def4d6ef08bf502444482b244499a, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4764	4466	4459	4459
q2	390	169	153	153
q3	1485	1309	1228	1228
q4	1156	1010	962	962
q5	3173	3148	3176	3148
q6	257	134	131	131
q7	1032	482	495	482
q8	2298	2285	2250	2250
q9	6727	6685	6717	6685
q10	3227	3291	3274	3274
q11	331	217	211	211
q12	361	207	214	207
q13	4532	3820	3791	3791
q14	236	213	217	213
q15	580	511	518	511
q16	447	389	377	377
q17	1034	772	627	627
q18	7113	6764	6846	6764
q19	1586	1599	1610	1599
q20	531	313	288	288
q21	3294	2792	2785	2785
q22	368	299	312	299
Total cold run time: 44922 ms
Total hot run time: 40444 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4408	4408	4392	4392
q2	274	168	171	168
q3	3535	3500	3486	3486
q4	2449	2430	2434	2430
q5	5738	5737	5716	5716
q6	251	126	123	123
q7	2380	1874	1878	1874
q8	3616	3617	3616	3616
q9	8992	8952	8951	8951
q10	3918	4004	4044	4004
q11	497	380	372	372
q12	772	596	607	596
q13	4265	3539	3532	3532
q14	291	262	252	252
q15	572	523	530	523
q16	504	435	450	435
q17	1982	1963	1950	1950
q18	8640	8205	8244	8205
q19	1837	1830	1835	1830
q20	2250	1954	1951	1951
q21	6661	6290	6268	6268
q22	552	453	464	453
Total cold run time: 64384 ms
Total hot run time: 61127 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.74 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 20.7 seconds inserted 10000000 Rows, about 483K ops/s
storage size: 17187898988 Bytes

@sjyango
Copy link
Contributor Author

sjyango commented Dec 27, 2023

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.59% (8562/23397)
Line Coverage: 28.66% (69655/242998)
Region Coverage: 27.68% (36041/130188)
Branch Coverage: 24.41% (18426/75476)
Coverage Report: http://coverage.selectdb-in.cc/coverage/0ae016e23ca8807c1363f2a2380cece3dca701b8_0ae016e23ca8807c1363f2a2380cece3dca701b8/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 0ae016e23ca8807c1363f2a2380cece3dca701b8, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4772	4442	4450	4442
q2	374	141	158	141
q3	1476	1329	1175	1175
q4	1154	953	976	953
q5	3190	3170	3163	3163
q6	260	137	136	136
q7	1027	502	486	486
q8	2239	2277	2242	2242
q9	6737	6685	6720	6685
q10	3222	3283	3310	3283
q11	323	205	210	205
q12	361	208	211	208
q13	4564	3811	3758	3758
q14	247	212	210	210
q15	587	529	530	529
q16	446	387	381	381
q17	1035	800	615	615
q18	6991	6731	6849	6731
q19	1563	1596	1593	1593
q20	561	321	316	316
q21	3211	2767	2744	2744
q22	368	307	312	307
Total cold run time: 44708 ms
Total hot run time: 40303 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4406	4417	4367	4367
q2	269	167	166	166
q3	3516	3511	3505	3505
q4	2442	2431	2425	2425
q5	5731	5704	5728	5704
q6	250	126	124	124
q7	2401	1830	1866	1830
q8	3608	3606	3601	3601
q9	8982	8984	8974	8974
q10	3945	4011	4013	4011
q11	478	370	378	370
q12	772	601	626	601
q13	4281	3564	3531	3531
q14	288	263	269	263
q15	567	520	519	519
q16	499	471	457	457
q17	1986	1948	1954	1948
q18	8591	8341	8320	8320
q19	1822	1846	1831	1831
q20	2248	1953	1947	1947
q21	6624	6272	6208	6208
q22	553	480	467	467
Total cold run time: 64259 ms
Total hot run time: 61169 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.61 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17184197061 Bytes

@sjyango
Copy link
Contributor Author

sjyango commented Dec 27, 2023

run p0

Copy link
Contributor

@amorynan amorynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 28, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 3dc3e81 into apache:master Dec 28, 2023
@sjyango sjyango deleted the ip-parser branch December 28, 2023 03:05
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 28, 2023
…29044)

Transforming from parsing std:: string to parsing char * to accelerate the parsing of ipv4/v6 data types.
hello-stephen pushed a commit to hello-stephen/doris that referenced this pull request Dec 28, 2023
…29044)

Transforming from parsing std:: string to parsing char * to accelerate the parsing of ipv4/v6 data types.
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
…29044)

Transforming from parsing std:: string to parsing char * to accelerate the parsing of ipv4/v6 data types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants