Skip to content

Conversation

@mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Jul 4, 2025

pick #52543

…apache#52543) (apache#52744)

Boost tokenizer requires explicit "." after "$" to correctly extract
JSON path tokens. Without this, expressions like "$[0].key" cannot be
properly split, causing issues in downstream logic. This commit ensures
a "." is automatically added after "$" to maintain consistent token
parsing behavior.

### What problem does this PR solve?

pick apache#52543

Issue Number: close #xxx

Related PR: apache#52543

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@mrhhsg mrhhsg requested a review from morrySnow as a code owner July 4, 2025 09:56
@Thearas
Copy link
Contributor

Thearas commented Jul 4, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Member Author

mrhhsg commented Jul 4, 2025

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 38.24% (13/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.12% (12405/27492)
Line Coverage 36.05% (109671/304237)
Region Coverage 35.20% (56953/161797)
Branch Coverage 32.30% (30875/95590)

@mrhhsg
Copy link
Member Author

mrhhsg commented Jul 6, 2025

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 38.24% (13/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.14% (12413/27497)
Line Coverage 36.09% (109802/304287)
Region Coverage 35.24% (57021/161817)
Branch Coverage 32.34% (30924/95614)

@morrySnow
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39517 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f49de06d9f69d6b39471894085dbbdbcb164523b, data reload: false

------ Round 1 ----------------------------------
q1	17633	6984	6601	6601
q2	2059	176	165	165
q3	10624	1140	1188	1140
q4	10378	743	793	743
q5	7744	2837	2793	2793
q6	213	132	134	132
q7	997	615	601	601
q8	9344	1953	1974	1953
q9	6721	6394	6438	6394
q10	7039	2271	2270	2270
q11	468	262	264	262
q12	432	214	211	211
q13	17803	3000	2943	2943
q14	246	205	210	205
q15	506	466	471	466
q16	466	379	372	372
q17	976	568	569	568
q18	7241	6536	6599	6536
q19	1317	960	901	901
q20	483	202	198	198
q21	3878	3052	3212	3052
q22	1112	1011	1029	1011
Total cold run time: 107680 ms
Total hot run time: 39517 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6622	6586	6538	6538
q2	331	225	233	225
q3	2911	2879	2894	2879
q4	2144	1839	1828	1828
q5	5645	5760	5720	5720
q6	208	125	130	125
q7	2218	1812	1762	1762
q8	3387	3522	3525	3522
q9	8818	8815	8914	8815
q10	3565	3521	3539	3521
q11	584	491	486	486
q12	789	589	596	589
q13	8692	3173	3141	3141
q14	322	262	275	262
q15	507	473	452	452
q16	469	436	447	436
q17	1838	1655	1595	1595
q18	8296	7816	7736	7736
q19	1669	1575	1575	1575
q20	2039	1849	1863	1849
q21	5263	5085	5058	5058
q22	1161	1088	1036	1036
Total cold run time: 67478 ms
Total hot run time: 59150 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196842 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f49de06d9f69d6b39471894085dbbdbcb164523b, data reload: false

query1	1277	903	888	888
query2	6236	1969	1908	1908
query3	10986	4579	4743	4579
query4	33062	23426	23898	23426
query5	4400	443	447	443
query6	279	181	177	177
query7	4001	316	328	316
query8	287	228	229	228
query9	9338	2583	2580	2580
query10	471	266	259	259
query11	18013	15225	15162	15162
query12	150	106	102	102
query13	1560	435	409	409
query14	8517	6609	7486	6609
query15	261	192	190	190
query16	8169	538	494	494
query17	1652	592	597	592
query18	2163	334	340	334
query19	236	167	163	163
query20	125	123	119	119
query21	204	109	113	109
query22	4898	4624	4678	4624
query23	34990	34405	33865	33865
query24	11114	2944	2950	2944
query25	684	456	452	452
query26	1430	170	175	170
query27	2833	369	360	360
query28	7735	2176	2236	2176
query29	939	471	453	453
query30	249	162	163	162
query31	1071	898	817	817
query32	101	57	55	55
query33	740	295	309	295
query34	1014	500	531	500
query35	871	735	720	720
query36	1106	958	953	953
query37	135	69	64	64
query38	4133	3938	3973	3938
query39	1541	1484	1478	1478
query40	253	110	102	102
query41	48	46	46	46
query42	113	103	104	103
query43	530	486	488	486
query44	1202	808	824	808
query45	185	177	169	169
query46	1179	741	745	741
query47	2067	1945	1980	1945
query48	434	340	345	340
query49	969	389	393	389
query50	847	422	425	422
query51	7431	7261	7366	7261
query52	105	86	91	86
query53	259	186	178	178
query54	1080	464	473	464
query55	78	77	84	77
query56	276	247	247	247
query57	1333	1237	1210	1210
query58	222	211	222	211
query59	3361	3158	3121	3121
query60	286	258	268	258
query61	147	132	143	132
query62	872	711	712	711
query63	221	194	190	190
query64	4931	667	646	646
query65	3314	3285	3326	3285
query66	1082	322	308	308
query67	15939	15590	15455	15455
query68	4618	578	571	571
query69	438	259	265	259
query70	1137	1117	1091	1091
query71	335	243	248	243
query72	6237	3969	4033	3969
query73	753	345	349	345
query74	10233	9145	8978	8978
query75	3361	2637	2716	2637
query76	2677	1088	1092	1088
query77	372	273	266	266
query78	10642	9739	9582	9582
query79	1675	597	586	586
query80	1121	417	420	417
query81	531	224	220	220
query82	960	93	85	85
query83	259	142	144	142
query84	241	79	86	79
query85	1347	301	296	296
query86	404	296	296	296
query87	4437	4243	4182	4182
query88	3514	2427	2384	2384
query89	418	296	288	288
query90	1951	185	180	180
query91	142	106	105	105
query92	66	51	50	50
query93	1918	553	551	551
query94	870	304	295	295
query95	362	255	257	255
query96	624	283	280	280
query97	3331	3117	3247	3117
query98	214	206	208	206
query99	1498	1304	1295	1295
Total cold run time: 302421 ms
Total hot run time: 196842 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 38.24% (13/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.23% (12464/27555)
Line Coverage 36.19% (110462/305201)
Region Coverage 35.34% (57304/162130)
Branch Coverage 32.43% (31075/95816)

@doris-robot
Copy link

ClickBench: Total hot run time: 30.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f49de06d9f69d6b39471894085dbbdbcb164523b, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.06	0.06
query4	1.62	0.11	0.10
query5	0.54	0.52	0.51
query6	1.13	0.73	0.74
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.54	0.51	0.50
query10	0.56	0.55	0.55
query11	0.14	0.10	0.11
query12	0.14	0.12	0.11
query13	0.60	0.59	0.59
query14	0.77	0.77	0.81
query15	0.84	0.82	0.83
query16	0.38	0.39	0.37
query17	1.05	1.06	1.04
query18	0.22	0.22	0.22
query19	1.95	1.87	1.79
query20	0.02	0.01	0.01
query21	15.43	0.57	0.58
query22	2.49	1.49	1.50
query23	17.12	1.08	0.81
query24	3.04	1.15	2.21
query25	0.14	0.18	0.06
query26	0.58	0.13	0.14
query27	0.04	0.04	0.05
query28	9.11	0.51	0.51
query29	12.56	3.25	3.22
query30	0.24	0.06	0.06
query31	2.88	0.39	0.38
query32	3.25	0.45	0.45
query33	2.94	3.02	3.04
query34	17.02	4.52	4.52
query35	4.61	4.54	4.56
query36	0.67	0.48	0.49
query37	0.08	0.06	0.06
query38	0.05	0.03	0.04
query39	0.03	0.02	0.02
query40	0.16	0.12	0.13
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 103.46 s
Total hot run time: 30.11 s

@morrySnow morrySnow changed the title [fix](json) Add . after in JSON path to support correct token parsing (#52543) branch-3.1: [fix](json) Add . after in JSON path to support correct token parsing #52543 Jul 8, 2025
@morrySnow morrySnow merged commit 0da2a00 into apache:branch-3.1 Jul 8, 2025
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants