Skip to content

Conversation

@0AyanamiRei
Copy link
Contributor

@0AyanamiRei 0AyanamiRei commented Sep 10, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: doc-2874

Problem Summary:

For the read_json_by_line and strip_outer_array parameters, considering that the first parameter will gradually be deprecated in the future, and some users may forget to specify these two parameters when importing JSON files, I will modify the default behavior of these two parameters: if the user does not specify values for these two parameters, the default setting for read_json_by_line will be true.

Behavior patterns after this PR:

1、In scenarios such as S3 load, since read_json_by_line is not only related to importing JSON formats but also serves as the switch for streaming JSON file reading, it will be hardcoded to true (thus, JSON formats requiring this parameter to be false are not supported in such environments).
2、In scenarios such as Stream Load, users have absolute freedom to specify any combination of parameter values (though typically we do not expect users to actively set either to false).

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Sep 10, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 34803 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9a0998b36fc0587bcbf19584fc3918dfa75704a7, data reload: false

------ Round 1 ----------------------------------
q1	17605	5148	5026	5026
q2	1969	328	211	211
q3	10256	1295	720	720
q4	10227	1007	547	547
q5	7544	2448	2317	2317
q6	181	168	138	138
q7	951	773	662	662
q8	9339	1387	1124	1124
q9	6912	5152	5173	5152
q10	6885	2408	2023	2023
q11	498	305	273	273
q12	362	357	222	222
q13	17757	3645	3065	3065
q14	243	269	217	217
q15	559	507	486	486
q16	1026	998	953	953
q17	594	874	359	359
q18	7582	7225	7166	7166
q19	1232	966	555	555
q20	346	333	243	243
q21	3708	3199	2358	2358
q22	1078	1061	986	986
Total cold run time: 106854 ms
Total hot run time: 34803 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5119	5108	5091	5091
q2	254	330	239	239
q3	2191	2665	2337	2337
q4	1342	1765	1328	1328
q5	4251	4575	4567	4567
q6	228	175	141	141
q7	2087	1961	1804	1804
q8	2609	2714	2718	2714
q9	7423	7729	7299	7299
q10	3136	3372	2863	2863
q11	599	523	513	513
q12	738	772	666	666
q13	3444	4031	3327	3327
q14	292	317	286	286
q15	538	475	485	475
q16	1102	1095	1038	1038
q17	1285	1581	1384	1384
q18	8028	7640	7607	7607
q19	835	787	870	787
q20	2023	2106	1903	1903
q21	5021	4511	4239	4239
q22	1107	1015	988	988
Total cold run time: 53652 ms
Total hot run time: 51596 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189468 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9a0998b36fc0587bcbf19584fc3918dfa75704a7, data reload: false

query1	1082	430	410	410
query2	6570	1715	1679	1679
query3	6758	234	221	221
query4	25945	23715	22934	22934
query5	4392	640	564	564
query6	324	238	222	222
query7	4650	534	302	302
query8	295	251	251	251
query9	8616	2988	2984	2984
query10	496	347	312	312
query11	15768	15053	14791	14791
query12	175	124	120	120
query13	1680	566	419	419
query14	10403	9263	9235	9235
query15	207	205	188	188
query16	7363	690	494	494
query17	1247	759	652	652
query18	2013	440	340	340
query19	213	215	188	188
query20	133	133	126	126
query21	218	134	122	122
query22	4155	4115	3992	3992
query23	34045	33058	33250	33058
query24	8239	2437	2482	2437
query25	647	514	454	454
query26	1235	293	170	170
query27	2716	522	356	356
query28	4376	2296	2261	2261
query29	831	632	491	491
query30	293	229	203	203
query31	898	817	736	736
query32	97	87	82	82
query33	581	414	361	361
query34	822	866	524	524
query35	867	845	756	756
query36	988	1040	936	936
query37	130	114	91	91
query38	3517	3529	3511	3511
query39	1480	1465	1411	1411
query40	235	137	129	129
query41	65	61	59	59
query42	133	115	119	115
query43	489	508	475	475
query44	1368	907	875	875
query45	182	187	180	180
query46	879	1020	644	644
query47	1776	1849	1731	1731
query48	405	425	324	324
query49	756	525	424	424
query50	672	692	420	420
query51	4055	3950	3915	3915
query52	122	113	111	111
query53	243	268	194	194
query54	625	626	549	549
query55	94	91	98	91
query56	349	342	321	321
query57	1186	1191	1142	1142
query58	296	289	288	288
query59	2540	2682	2506	2506
query60	368	357	356	356
query61	170	163	218	163
query62	811	745	689	689
query63	241	201	208	201
query64	4541	1168	864	864
query65	4063	3965	3994	3965
query66	1187	446	355	355
query67	15783	15194	15239	15194
query68	8127	928	582	582
query69	530	349	297	297
query70	1401	1241	1280	1241
query71	540	354	333	333
query72	5896	5099	4993	4993
query73	666	606	359	359
query74	8978	9116	8900	8900
query75	3312	3285	2790	2790
query76	3271	1198	735	735
query77	452	407	343	343
query78	9508	9661	8837	8837
query79	2434	804	592	592
query80	645	592	530	530
query81	538	267	237	237
query82	235	168	166	166
query83	266	280	256	256
query84	265	109	99	99
query85	898	474	448	448
query86	383	331	316	316
query87	3736	3757	3669	3669
query88	3937	2253	2274	2253
query89	404	353	302	302
query90	1945	239	227	227
query91	167	175	190	175
query92	93	83	71	71
query93	1963	1016	635	635
query94	704	426	326	326
query95	423	340	341	340
query96	486	594	284	284
query97	2913	2982	2882	2882
query98	260	233	223	223
query99	1338	1447	1363	1363
Total cold run time: 274813 ms
Total hot run time: 189468 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.39 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9a0998b36fc0587bcbf19584fc3918dfa75704a7, data reload: false

query1	0.06	0.05	0.04
query2	0.08	0.06	0.06
query3	0.26	0.08	0.08
query4	1.61	0.12	0.12
query5	0.30	0.27	0.24
query6	1.16	0.65	0.63
query7	0.04	0.03	0.03
query8	0.06	0.04	0.05
query9	0.63	0.53	0.52
query10	0.58	0.57	0.57
query11	0.17	0.11	0.11
query12	0.16	0.13	0.12
query13	0.63	0.64	0.63
query14	1.04	1.03	1.05
query15	0.88	0.84	0.88
query16	0.42	0.42	0.43
query17	1.09	1.06	1.07
query18	0.22	0.20	0.20
query19	1.99	1.87	1.82
query20	0.01	0.02	0.02
query21	15.42	0.96	0.60
query22	0.76	1.20	0.90
query23	14.72	1.44	0.65
query24	6.67	1.15	1.00
query25	0.52	0.30	0.08
query26	0.64	0.16	0.13
query27	0.07	0.06	0.05
query28	9.68	0.95	0.43
query29	12.55	3.89	3.26
query30	0.29	0.14	0.11
query31	2.83	0.61	0.39
query32	3.24	0.58	0.48
query33	3.07	3.13	3.14
query34	16.13	5.44	4.84
query35	4.97	4.90	4.89
query36	0.70	0.52	0.50
query37	0.10	0.07	0.07
query38	0.08	0.05	0.04
query39	0.04	0.03	0.03
query40	0.19	0.15	0.15
query41	0.09	0.04	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 104.23 s
Total hot run time: 30.39 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/44) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.96% (17302/33296)
Line Coverage 37.30% (157637/422631)
Region Coverage 31.93% (120182/376443)
Branch Coverage 33.32% (52823/158527)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.18% (41/44) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 69.43% (22699/32695)
Line Coverage 55.47% (234309/422377)
Region Coverage 50.89% (194316/381872)
Branch Coverage 52.39% (83533/159430)

@0AyanamiRei 0AyanamiRei changed the title [fix](json load) Handle multiple JSON records per line in read_json_by_line mode [Enhancement](json load) Set jsonload's default behavior to be read_json_by_line Sep 11, 2025
@0AyanamiRei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34617 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d8444d07a214530c3f3afa929772e5cf76e68f8d, data reload: false

------ Round 1 ----------------------------------
q1	17602	5192	5083	5083
q2	2000	347	209	209
q3	10219	1302	696	696
q4	10233	1001	548	548
q5	7566	2413	2342	2342
q6	183	172	138	138
q7	934	790	651	651
q8	9358	1338	1121	1121
q9	6902	5163	5197	5163
q10	6946	2372	1972	1972
q11	491	304	279	279
q12	362	357	238	238
q13	17792	3627	3030	3030
q14	249	232	220	220
q15	559	508	498	498
q16	1004	991	965	965
q17	623	859	362	362
q18	7330	7200	7006	7006
q19	1490	953	547	547
q20	365	348	227	227
q21	3694	2527	2332	2332
q22	1055	1053	990	990
Total cold run time: 106957 ms
Total hot run time: 34617 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5264	5125	5099	5099
q2	266	332	228	228
q3	2151	2639	2270	2270
q4	1309	1767	1337	1337
q5	4249	4374	4568	4374
q6	217	174	140	140
q7	2040	1975	1834	1834
q8	2671	2639	2651	2639
q9	7316	7369	7272	7272
q10	3166	3301	2904	2904
q11	601	521	536	521
q12	690	759	652	652
q13	3495	3917	3337	3337
q14	292	297	275	275
q15	530	493	500	493
q16	1044	1120	1058	1058
q17	1279	1691	1409	1409
q18	7950	7726	7578	7578
q19	811	728	752	728
q20	1904	1931	1854	1854
q21	4690	4387	4282	4282
q22	1064	1052	1016	1016
Total cold run time: 52999 ms
Total hot run time: 51300 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190059 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d8444d07a214530c3f3afa929772e5cf76e68f8d, data reload: false

query1	1063	451	429	429
query2	6558	1722	1676	1676
query3	6755	228	221	221
query4	26332	23357	23509	23357
query5	4442	649	518	518
query6	339	253	240	240
query7	4651	513	296	296
query8	319	263	252	252
query9	8661	2909	2913	2909
query10	483	348	298	298
query11	15678	15236	15338	15236
query12	191	131	126	126
query13	1721	554	424	424
query14	10760	9273	9226	9226
query15	207	232	178	178
query16	7365	648	485	485
query17	1214	770	623	623
query18	2008	431	357	357
query19	213	196	171	171
query20	137	130	125	125
query21	211	131	116	116
query22	4075	4121	4071	4071
query23	34330	32898	33264	32898
query24	8232	2386	2390	2386
query25	586	519	447	447
query26	1237	281	169	169
query27	2753	523	369	369
query28	4373	2249	2250	2249
query29	815	654	538	538
query30	300	238	211	211
query31	914	802	709	709
query32	90	89	87	87
query33	589	405	368	368
query34	804	852	529	529
query35	883	834	786	786
query36	980	1026	938	938
query37	139	118	92	92
query38	3491	3537	3524	3524
query39	1498	1406	1427	1406
query40	224	138	127	127
query41	64	64	63	63
query42	128	118	129	118
query43	520	504	453	453
query44	1322	868	874	868
query45	189	183	182	182
query46	866	1048	650	650
query47	1802	1837	1750	1750
query48	399	456	322	322
query49	769	497	424	424
query50	637	690	404	404
query51	3924	3913	3831	3831
query52	121	117	111	111
query53	238	273	198	198
query54	614	601	542	542
query55	92	93	94	93
query56	343	338	320	320
query57	1213	1219	1130	1130
query58	300	287	296	287
query59	2552	2685	2589	2589
query60	367	369	333	333
query61	163	160	155	155
query62	815	712	655	655
query63	228	193	196	193
query64	4442	1180	838	838
query65	4059	3983	3957	3957
query66	1162	457	366	366
query67	15179	15176	15204	15176
query68	8069	931	579	579
query69	495	341	298	298
query70	1512	1193	1318	1193
query71	585	356	325	325
query72	6079	5180	5245	5180
query73	746	652	362	362
query74	9060	9091	8921	8921
query75	3921	3259	2775	2775
query76	3599	1148	819	819
query77	813	404	332	332
query78	9662	9500	8851	8851
query79	2363	853	595	595
query80	669	582	528	528
query81	504	265	233	233
query82	493	176	141	141
query83	260	264	265	264
query84	258	126	96	96
query85	899	458	435	435
query86	399	324	321	321
query87	3738	3731	3671	3671
query88	3660	2193	2198	2193
query89	388	337	299	299
query90	1893	224	219	219
query91	173	166	134	134
query92	87	76	72	72
query93	1789	1002	645	645
query94	708	443	336	336
query95	418	343	322	322
query96	491	579	281	281
query97	2942	2988	2896	2896
query98	250	219	216	216
query99	1654	1423	1352	1352
Total cold run time: 276706 ms
Total hot run time: 190059 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.9 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d8444d07a214530c3f3afa929772e5cf76e68f8d, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.06
query3	0.25	0.08	0.08
query4	1.60	0.12	0.12
query5	0.27	0.27	0.26
query6	1.19	0.66	0.64
query7	0.03	0.03	0.03
query8	0.05	0.04	0.05
query9	0.62	0.53	0.52
query10	0.57	0.57	0.57
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.63	0.63	0.61
query14	1.04	1.04	1.03
query15	0.91	0.85	0.86
query16	0.41	0.39	0.41
query17	1.02	1.07	1.08
query18	0.22	0.20	0.20
query19	1.96	1.90	1.81
query20	0.01	0.02	0.01
query21	15.39	0.94	0.61
query22	0.74	1.19	0.72
query23	14.89	1.41	0.63
query24	6.88	1.35	0.70
query25	0.49	0.34	0.10
query26	0.66	0.15	0.13
query27	0.06	0.06	0.06
query28	9.42	0.94	0.44
query29	12.58	3.97	3.28
query30	0.29	0.13	0.11
query31	2.83	0.61	0.39
query32	3.24	0.57	0.47
query33	3.14	3.02	3.20
query34	15.68	5.49	4.88
query35	4.92	4.90	4.88
query36	0.71	0.52	0.52
query37	0.11	0.08	0.07
query38	0.07	0.05	0.05
query39	0.03	0.04	0.03
query40	0.18	0.17	0.15
query41	0.08	0.04	0.03
query42	0.04	0.04	0.03
query43	0.04	0.04	0.04
Total cold run time: 103.72 s
Total hot run time: 29.9 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/11) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.01% (17315/33291)
Line Coverage 37.35% (157876/422735)
Region Coverage 31.96% (120455/376925)
Branch Coverage 33.34% (52883/158604)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (11/11) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.75% (23129/32689)
Line Coverage 57.15% (241458/422480)
Region Coverage 52.47% (200618/382353)
Branch Coverage 54.08% (86263/159507)

@0AyanamiRei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 30.98 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f678ff44204882b0a73eb079b71cf6c1276c6cbf, data reload: false

query1	0.06	0.05	0.04
query2	0.08	0.06	0.05
query3	0.27	0.08	0.08
query4	1.60	0.12	0.13
query5	0.29	0.27	0.26
query6	1.18	0.64	0.65
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.64	0.55	0.52
query10	0.58	0.57	0.57
query11	0.17	0.11	0.12
query12	0.16	0.12	0.12
query13	0.63	0.63	0.62
query14	1.03	1.04	1.02
query15	0.88	0.86	0.86
query16	0.41	0.40	0.40
query17	1.08	1.03	1.02
query18	0.21	0.21	0.21
query19	2.24	1.97	2.08
query20	0.01	0.01	0.02
query21	15.65	0.91	0.58
query22	0.76	1.38	0.95
query23	14.72	1.38	0.64
query24	7.08	1.27	0.96
query25	0.46	0.20	0.09
query26	0.62	0.16	0.13
query27	0.07	0.06	0.05
query28	9.68	1.44	0.93
query29	12.60	3.95	3.30
query30	0.29	0.13	0.13
query31	2.82	0.62	0.38
query32	3.25	0.57	0.49
query33	3.16	3.06	3.15
query34	16.21	5.46	4.86
query35	4.91	4.93	4.85
query36	0.70	0.54	0.50
query37	0.11	0.07	0.08
query38	0.06	0.05	0.05
query39	0.03	0.03	0.03
query40	0.20	0.15	0.14
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 105.16 s
Total hot run time: 30.98 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (10/10) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.41% (17641/33661)
Line Coverage 37.64% (160191/425576)
Region Coverage 32.17% (122022/379259)
Branch Coverage 33.51% (53477/159594)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.05% (23688/32879)
Line Coverage 58.95% (250222/424470)
Region Coverage 55.19% (212119/384311)
Branch Coverage 56.25% (90142/160247)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (10/10) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 25, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.05% (23688/32879)
Line Coverage 58.95% (250222/424470)
Region Coverage 55.19% (212119/384311)
Branch Coverage 56.25% (90142/160247)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (10/10) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.14% (23390/32879)
Line Coverage 57.54% (244241/424470)
Region Coverage 52.99% (203644/384311)
Branch Coverage 54.56% (87429/160247)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (10/10) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 151cb72 into apache:master Sep 28, 2025
28 of 31 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 28, 2025
…son_by_line (#55861)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR:
[doc-2874](apache/doris-website#2874)

Problem Summary:

For the read_json_by_line and strip_outer_array parameters, considering
that the first parameter will gradually be deprecated in the future, and
some users may forget to specify these two parameters when importing
JSON files, I will modify the default behavior of these two parameters:
if the user does not specify values for these two parameters, the
default setting for read_json_by_line will be true.

Behavior patterns after this PR:

1、In scenarios such as S3 load, since read_json_by_line is not only
related to importing JSON formats but also serves as the switch for
streaming JSON file reading, it will be hardcoded to true (thus, JSON
formats requiring this parameter to be false are not supported in such
environments).
2、In scenarios such as Stream Load, users have absolute freedom to
specify any combination of parameter values (though typically we do not
expect users to actively set either to false).

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei pushed a commit that referenced this pull request Sep 29, 2025
…to be read_json_by_line #55861 (#56594)

Cherry-picked from #55861

Co-authored-by: Refrain <113875799+0AyanamiRei@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Oct 9, 2025
…son_by_line (#55861)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR:
[doc-2874](apache/doris-website#2874)

Problem Summary:

For the read_json_by_line and strip_outer_array parameters, considering
that the first parameter will gradually be deprecated in the future, and
some users may forget to specify these two parameters when importing
JSON files, I will modify the default behavior of these two parameters:
if the user does not specify values for these two parameters, the
default setting for read_json_by_line will be true.

Behavior patterns after this PR:

1、In scenarios such as S3 load, since read_json_by_line is not only
related to importing JSON formats but also serves as the switch for
streaming JSON file reading, it will be hardcoded to true (thus, JSON
formats requiring this parameter to be false are not supported in such
environments).
2、In scenarios such as Stream Load, users have absolute freedom to
specify any combination of parameter values (though typically we do not
expect users to actively set either to false).

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
morrySnow pushed a commit that referenced this pull request Oct 11, 2025
…to be read_json_by_line #55861 (#56736)

Cherry-picked from #55861

Co-authored-by: Refrain <113875799+0AyanamiRei@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants