Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Aug 28, 2025

What problem does this PR solve?

User may not specify data format in broker load, so we can only infer the data format
after listing the files.
So we have to defer the initialization of file properties object

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Aug 28, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman changed the title [Fix](Load)Add AUTO file format placeholder for unspecified formats [Fix](Load) Use defered file format properties for broker load without format Aug 29, 2025
@morningman morningman marked this pull request as ready for review August 29, 2025 03:16
@morningman morningman requested a review from morrySnow as a code owner August 29, 2025 03:16
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32968 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 72408038fb4d4e3eeca2a6ae633721e45e54b4dd, data reload: false

------ Round 1 ----------------------------------
q1	17581	5399	5412	5399
q2	2018	415	281	281
q3	11905	1223	754	754
q4	10270	858	458	458
q5	8699	2378	2152	2152
q6	182	163	134	134
q7	895	737	612	612
q8	9338	1418	1197	1197
q9	5258	4930	4931	4930
q10	6783	2277	1814	1814
q11	467	290	264	264
q12	335	352	210	210
q13	17794	3637	3017	3017
q14	233	220	221	220
q15	514	462	455	455
q16	417	429	369	369
q17	606	866	369	369
q18	7045	6670	6550	6550
q19	1204	942	562	562
q20	332	343	211	211
q21	2758	2147	2009	2009
q22	1080	1015	1001	1001
Total cold run time: 105714 ms
Total hot run time: 32968 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5500	5511	5454	5454
q2	233	331	226	226
q3	2236	2619	2305	2305
q4	1370	1784	1386	1386
q5	4398	4974	4977	4974
q6	170	162	131	131
q7	2067	1956	1839	1839
q8	2599	2788	2715	2715
q9	7228	7297	7215	7215
q10	3007	3309	2738	2738
q11	549	511	504	504
q12	654	751	622	622
q13	3415	3837	3147	3147
q14	297	287	268	268
q15	504	477	469	469
q16	443	472	427	427
q17	1206	1767	1260	1260
q18	7696	7328	7325	7325
q19	789	1143	1048	1048
q20	2016	2037	1900	1900
q21	5346	5014	4733	4733
q22	1097	1076	1004	1004
Total cold run time: 52820 ms
Total hot run time: 51690 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192791 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 72408038fb4d4e3eeca2a6ae633721e45e54b4dd, data reload: false

query1	968	406	405	405
query2	6226	1897	1886	1886
query3	8689	201	198	198
query4	33571	23749	23718	23718
query5	3705	576	444	444
query6	299	192	171	171
query7	4188	479	319	319
query8	297	232	232	232
query9	9442	2610	2590	2590
query10	489	332	253	253
query11	18039	15512	15160	15160
query12	161	118	106	106
query13	1550	550	426	426
query14	9624	7410	6869	6869
query15	259	195	176	176
query16	8110	642	471	471
query17	1558	814	621	621
query18	2111	416	335	335
query19	243	186	168	168
query20	133	127	117	117
query21	203	129	112	112
query22	4532	4728	4441	4441
query23	35131	34316	34214	34214
query24	7331	2698	2708	2698
query25	557	516	431	431
query26	1174	283	178	178
query27	2187	485	368	368
query28	5471	2215	2191	2191
query29	798	602	466	466
query30	256	198	159	159
query31	1029	923	851	851
query32	99	65	62	62
query33	500	394	338	338
query34	748	868	509	509
query35	774	808	744	744
query36	976	1054	983	983
query37	111	92	68	68
query38	4023	4002	4010	4002
query39	1501	1477	1460	1460
query40	211	120	107	107
query41	49	50	47	47
query42	123	106	112	106
query43	511	510	486	486
query44	1395	852	822	822
query45	190	179	170	170
query46	872	1058	687	687
query47	1992	1996	1917	1917
query48	414	426	348	348
query49	774	493	408	408
query50	652	672	448	448
query51	7306	7518	7259	7259
query52	99	99	94	94
query53	236	252	195	195
query54	552	547	474	474
query55	82	79	79	79
query56	268	277	256	256
query57	1273	1321	1204	1204
query58	237	217	215	215
query59	3055	3232	3062	3062
query60	292	291	292	291
query61	122	115	116	115
query62	795	754	712	712
query63	236	203	204	203
query64	4482	994	640	640
query65	3381	3371	3314	3314
query66	997	401	309	309
query67	16571	15803	15610	15610
query68	7755	812	524	524
query69	473	307	271	271
query70	1176	1067	1102	1067
query71	431	308	265	265
query72	5172	3818	3827	3818
query73	649	738	353	353
query74	10137	9326	9260	9260
query75	3521	3151	2654	2654
query76	3412	1161	750	750
query77	759	381	266	266
query78	10339	10360	9545	9545
query79	4633	828	595	595
query80	662	526	435	435
query81	491	274	225	225
query82	238	116	87	87
query83	178	167	149	149
query84	289	106	76	76
query85	742	366	299	299
query86	355	305	302	302
query87	4374	4316	4259	4259
query88	4072	2404	2379	2379
query89	430	347	286	286
query90	2060	190	195	190
query91	138	144	111	111
query92	63	55	51	51
query93	3188	906	541	541
query94	656	390	322	322
query95	345	287	272	272
query96	486	603	288	288
query97	3157	3309	3133	3133
query98	218	210	201	201
query99	1592	1403	1299	1299
Total cold run time: 296421 ms
Total hot run time: 192791 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.72 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 72408038fb4d4e3eeca2a6ae633721e45e54b4dd, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.62	0.11	0.10
query5	0.53	0.50	0.52
query6	1.14	0.73	0.73
query7	0.02	0.01	0.02
query8	0.04	0.03	0.04
query9	0.57	0.50	0.51
query10	0.56	0.55	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.60	0.58
query14	0.77	0.81	0.80
query15	0.85	0.84	0.82
query16	0.37	0.38	0.39
query17	1.02	1.05	1.08
query18	0.25	0.23	0.23
query19	1.85	1.87	1.90
query20	0.01	0.01	0.01
query21	15.38	0.91	0.58
query22	0.75	0.79	0.75
query23	15.03	1.46	0.52
query24	3.73	1.49	0.57
query25	0.13	0.10	0.07
query26	0.35	0.16	0.15
query27	0.05	0.05	0.05
query28	13.02	1.00	0.44
query29	12.58	3.94	3.26
query30	0.25	0.08	0.06
query31	2.81	0.60	0.37
query32	3.22	0.56	0.46
query33	3.00	3.01	3.06
query34	16.86	5.23	4.58
query35	4.59	4.61	4.56
query36	0.63	0.50	0.47
query37	0.09	0.06	0.06
query38	0.06	0.03	0.03
query39	0.04	0.03	0.02
query40	0.16	0.14	0.13
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 103.73 s
Total hot run time: 28.72 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 36.71% (29/79) 🎉
Increment coverage report
Complete coverage report

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32470 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 03dad16132c9f587a2eecfdcde8fd36ef2bab4bc, data reload: false

------ Round 1 ----------------------------------
q1	17588	5605	5370	5370
q2	2019	392	279	279
q3	12002	1243	737	737
q4	10200	866	466	466
q5	7648	2355	2098	2098
q6	177	162	130	130
q7	892	740	622	622
q8	9329	1413	1094	1094
q9	5094	4961	4885	4885
q10	6742	2282	1829	1829
q11	475	273	259	259
q12	329	358	211	211
q13	17763	3640	3032	3032
q14	226	231	205	205
q15	531	471	476	471
q16	424	417	363	363
q17	569	857	359	359
q18	6920	6458	6350	6350
q19	1206	942	543	543
q20	334	341	208	208
q21	2729	2153	1956	1956
q22	1033	1012	1003	1003
Total cold run time: 104230 ms
Total hot run time: 32470 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5565	5913	5492	5492
q2	233	334	227	227
q3	2276	2617	2300	2300
q4	1351	1809	1352	1352
q5	4421	4907	4985	4907
q6	166	164	134	134
q7	2140	1996	1876	1876
q8	2599	2832	2722	2722
q9	7258	7181	7165	7165
q10	3007	3360	2785	2785
q11	579	488	506	488
q12	652	780	612	612
q13	3443	3767	3169	3169
q14	269	297	268	268
q15	529	483	473	473
q16	441	488	437	437
q17	1223	1722	1238	1238
q18	7523	7348	7206	7206
q19	793	1155	1026	1026
q20	2021	2063	1929	1929
q21	5457	5027	4584	4584
q22	1085	1071	1027	1027
Total cold run time: 53031 ms
Total hot run time: 51417 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192391 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 03dad16132c9f587a2eecfdcde8fd36ef2bab4bc, data reload: false

query1	963	396	405	396
query2	6247	1931	1881	1881
query3	8689	200	193	193
query4	33408	23831	23488	23488
query5	3715	594	448	448
query6	278	206	173	173
query7	4201	483	311	311
query8	297	237	241	237
query9	9414	2611	2597	2597
query10	454	328	260	260
query11	17889	15463	15267	15267
query12	164	104	111	104
query13	1566	559	433	433
query14	9548	6783	6815	6783
query15	241	190	188	188
query16	8020	633	529	529
query17	1554	765	600	600
query18	2171	434	335	335
query19	216	189	187	187
query20	131	127	116	116
query21	206	134	111	111
query22	4647	4551	4428	4428
query23	34935	34285	34140	34140
query24	7839	2679	2723	2679
query25	542	508	432	432
query26	1185	281	174	174
query27	2446	476	372	372
query28	5205	2190	2184	2184
query29	799	593	468	468
query30	240	199	165	165
query31	988	938	857	857
query32	94	62	58	58
query33	556	376	322	322
query34	745	868	511	511
query35	803	827	753	753
query36	1026	1046	940	940
query37	105	92	68	68
query38	4056	3985	3993	3985
query39	1518	1467	1481	1467
query40	213	119	105	105
query41	53	57	50	50
query42	129	108	108	108
query43	503	530	502	502
query44	1412	837	845	837
query45	192	184	175	175
query46	894	1051	683	683
query47	2031	1968	1913	1913
query48	397	437	353	353
query49	781	515	423	423
query50	665	700	428	428
query51	7371	7454	7341	7341
query52	102	104	93	93
query53	232	270	197	197
query54	541	545	470	470
query55	81	80	80	80
query56	281	288	265	265
query57	1277	1288	1232	1232
query58	232	231	217	217
query59	2995	3214	3058	3058
query60	292	293	270	270
query61	114	118	121	118
query62	813	770	678	678
query63	232	192	192	192
query64	4575	1005	657	657
query65	3419	3343	3384	3343
query66	997	409	313	313
query67	16290	15667	15482	15482
query68	7665	818	538	538
query69	494	338	260	260
query70	1172	1151	1081	1081
query71	455	290	273	273
query72	5181	3704	3889	3704
query73	626	747	356	356
query74	10263	9169	9110	9110
query75	3716	3148	2681	2681
query76	3576	1195	773	773
query77	798	369	285	285
query78	10393	10528	9602	9602
query79	2408	862	615	615
query80	719	520	445	445
query81	493	258	224	224
query82	319	121	87	87
query83	185	165	145	145
query84	278	97	87	87
query85	743	363	299	299
query86	338	316	285	285
query87	4366	4317	4243	4243
query88	3340	2413	2372	2372
query89	393	337	303	303
query90	1945	190	188	188
query91	135	138	108	108
query92	65	56	55	55
query93	1036	912	545	545
query94	645	417	305	305
query95	348	285	274	274
query96	471	597	281	281
query97	3215	3272	3139	3139
query98	219	217	204	204
query99	1477	1464	1301	1301
Total cold run time: 291692 ms
Total hot run time: 192391 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.58 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 03dad16132c9f587a2eecfdcde8fd36ef2bab4bc, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.07
query4	1.62	0.10	0.10
query5	0.53	0.50	0.51
query6	1.13	0.74	0.73
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.56	0.50	0.52
query10	0.56	0.56	0.57
query11	0.15	0.09	0.10
query12	0.14	0.10	0.10
query13	0.63	0.60	0.59
query14	0.80	0.80	0.80
query15	0.86	0.84	0.82
query16	0.39	0.41	0.38
query17	0.99	1.03	1.05
query18	0.24	0.23	0.23
query19	1.90	1.82	1.80
query20	0.01	0.01	0.01
query21	15.40	0.92	0.58
query22	0.75	0.73	0.65
query23	15.21	1.48	0.57
query24	3.74	0.66	0.65
query25	0.20	0.09	0.09
query26	0.42	0.16	0.14
query27	0.05	0.05	0.04
query28	12.68	1.00	0.42
query29	12.60	3.90	3.25
query30	0.25	0.09	0.06
query31	2.82	0.60	0.39
query32	3.23	0.54	0.47
query33	2.99	3.04	3.05
query34	16.53	5.16	4.48
query35	4.59	4.57	4.60
query36	0.62	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.03	0.04
query39	0.03	0.02	0.02
query40	0.17	0.14	0.12
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 103.49 s
Total hot run time: 28.58 s

@morrySnow morrySnow merged commit 4e0771f into apache:branch-3.1 Aug 29, 2025
22 checks passed
morningman added a commit to morningman/doris that referenced this pull request Aug 30, 2025
… load without format (apache#55450)

User may not specify data format in broker load, so we can only infer
the data format
after listing the files.
So we have to defer the initialization of file properties object

---------

Co-authored-by: Calvin Kirs <guoqiang@selectdb.com>
morningman added a commit that referenced this pull request Sep 2, 2025
…ut format (#55450) (#55498)

### What problem does this PR solve?

User may not specify data format in broker load, so we can only infer
the data format
after listing the files.
So we have to defer the initialization of file properties object
Co-authored-by: Calvin Kirs <guoqiang@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants