Skip to content

Conversation

@mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Jul 3, 2025

Boost tokenizer requires explicit "." after "$" to correctly extract JSON path tokens. Without this, expressions like "$[0].key" cannot be properly split, causing issues in downstream logic. This commit ensures a "." is automatically added after "$" to maintain consistent token parsing behavior.

What problem does this PR solve?

pick #52543

Issue Number: close #xxx

Related PR: #52543

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…apache#52543)

Boost tokenizer requires explicit "." after "$" to correctly extract
JSON path tokens. Without this, expressions like "$[0].key" cannot be
properly split, causing issues in downstream logic. This commit ensures
a "." is automatically added after "$" to maintain consistent token
parsing behavior.
@mrhhsg mrhhsg requested a review from dataroaring as a code owner July 3, 2025 15:00
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Member Author

mrhhsg commented Jul 3, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39694 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2d96919eb0a9604fc2ed85ca52825ddd46328844, data reload: false

------ Round 1 ----------------------------------
q1	17581	6953	6668	6668
q2	2102	174	160	160
q3	10778	1063	1141	1063
q4	10539	770	753	753
q5	7737	2844	2751	2751
q6	217	138	133	133
q7	976	611	617	611
q8	9359	1971	2029	1971
q9	6585	6389	6355	6355
q10	7004	2249	2309	2249
q11	464	253	255	253
q12	390	212	212	212
q13	17761	2946	2991	2946
q14	232	210	208	208
q15	503	451	464	451
q16	481	378	381	378
q17	964	580	547	547
q18	7339	6666	6626	6626
q19	1411	1049	1024	1024
q20	441	199	202	199
q21	3860	3152	3190	3152
q22	1067	996	984	984
Total cold run time: 107791 ms
Total hot run time: 39694 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6666	6602	6570	6570
q2	326	233	227	227
q3	2935	2777	2825	2777
q4	2063	1795	1790	1790
q5	5785	5715	5693	5693
q6	211	136	131	131
q7	2224	1809	1821	1809
q8	3351	3537	3511	3511
q9	8960	8741	8881	8741
q10	3561	3519	3469	3469
q11	599	493	502	493
q12	844	613	614	613
q13	7764	3139	3215	3139
q14	312	272	264	264
q15	505	454	473	454
q16	478	436	429	429
q17	1822	1610	1592	1592
q18	8286	7794	7671	7671
q19	1718	1470	1529	1470
q20	2129	1873	1804	1804
q21	5038	4987	5015	4987
q22	1107	990	1019	990
Total cold run time: 66684 ms
Total hot run time: 58624 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197921 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2d96919eb0a9604fc2ed85ca52825ddd46328844, data reload: false

query1	1290	904	883	883
query2	6277	1883	1871	1871
query3	10837	4328	4251	4251
query4	61595	29489	24011	24011
query5	5221	441	433	433
query6	406	164	189	164
query7	5470	309	314	309
query8	308	224	220	220
query9	8695	2560	2555	2555
query10	471	269	249	249
query11	17598	15874	15988	15874
query12	180	111	114	111
query13	1481	464	428	428
query14	9766	7655	7567	7567
query15	203	178	189	178
query16	7089	493	479	479
query17	1155	589	593	589
query18	1927	331	316	316
query19	205	168	164	164
query20	113	110	110	110
query21	207	103	103	103
query22	4583	4510	4643	4510
query23	34628	33920	34342	33920
query24	6160	3021	2889	2889
query25	540	419	422	419
query26	657	173	169	169
query27	1810	350	366	350
query28	4358	2137	2106	2106
query29	728	447	428	428
query30	235	154	160	154
query31	980	815	799	799
query32	68	58	55	55
query33	435	302	321	302
query34	925	521	531	521
query35	845	754	748	748
query36	1060	971	945	945
query37	108	70	75	70
query38	4155	3921	3962	3921
query39	1500	1472	1488	1472
query40	201	101	108	101
query41	49	48	48	48
query42	115	103	103	103
query43	511	490	475	475
query44	1165	812	804	804
query45	181	169	166	166
query46	1158	756	753	753
query47	2021	1973	1934	1934
query48	478	385	401	385
query49	728	394	403	394
query50	838	422	428	422
query51	7358	7144	7204	7144
query52	105	91	96	91
query53	269	201	188	188
query54	566	467	473	467
query55	78	77	79	77
query56	284	245	249	245
query57	1314	1235	1216	1216
query58	222	207	212	207
query59	3161	3025	3076	3025
query60	281	279	250	250
query61	110	108	113	108
query62	803	681	690	681
query63	228	196	192	192
query64	1370	666	644	644
query65	3263	3205	3169	3169
query66	701	298	287	287
query67	15720	15467	15423	15423
query68	4169	575	578	575
query69	429	265	260	260
query70	1145	1095	1030	1030
query71	346	255	258	255
query72	6337	3969	3965	3965
query73	772	337	355	337
query74	10549	8971	9299	8971
query75	3352	2659	2641	2641
query76	2012	1004	1055	1004
query77	495	274	271	271
query78	10550	9606	9658	9606
query79	2116	588	601	588
query80	1362	432	428	428
query81	517	220	221	220
query82	1218	89	92	89
query83	273	150	141	141
query84	286	76	74	74
query85	1034	296	293	293
query86	405	308	295	295
query87	4339	4297	4229	4229
query88	3827	2358	2332	2332
query89	425	294	296	294
query90	1976	189	186	186
query91	198	145	148	145
query92	59	49	52	49
query93	2860	563	556	556
query94	804	292	298	292
query95	357	255	261	255
query96	626	280	288	280
query97	3295	3137	3160	3137
query98	217	201	196	196
query99	1611	1294	1285	1285
Total cold run time: 315508 ms
Total hot run time: 197921 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.08 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2d96919eb0a9604fc2ed85ca52825ddd46328844, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.10
query5	0.53	0.53	0.51
query6	1.15	0.72	0.75
query7	0.02	0.02	0.02
query8	0.06	0.03	0.02
query9	0.57	0.50	0.50
query10	0.55	0.55	0.55
query11	0.14	0.11	0.10
query12	0.14	0.11	0.11
query13	0.60	0.60	0.60
query14	0.78	0.79	0.80
query15	0.85	0.82	0.83
query16	0.37	0.40	0.40
query17	1.06	1.05	1.05
query18	0.23	0.21	0.22
query19	1.92	1.86	1.88
query20	0.01	0.02	0.01
query21	15.39	0.60	0.60
query22	2.07	1.79	2.55
query23	17.07	0.97	0.76
query24	3.37	1.04	0.93
query25	0.19	0.10	0.13
query26	0.54	0.13	0.12
query27	0.06	0.05	0.05
query28	10.28	0.48	0.51
query29	12.61	3.21	3.19
query30	0.25	0.06	0.06
query31	2.86	0.38	0.38
query32	3.27	0.48	0.45
query33	2.94	2.97	3.05
query34	17.13	4.47	4.46
query35	4.54	4.58	4.48
query36	0.67	0.48	0.47
query37	0.08	0.06	0.07
query38	0.04	0.03	0.04
query39	0.03	0.02	0.02
query40	0.15	0.12	0.12
query41	0.07	0.02	0.02
query42	0.03	0.03	0.03
query43	0.04	0.02	0.02
Total cold run time: 104.61 s
Total hot run time: 30.08 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 38.24% (13/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 41.24% (10943/26538)
Line Coverage 32.00% (93641/292607)
Region Coverage 31.11% (48301/155272)
Branch Coverage 27.56% (24737/89764)

@github-actions
Copy link
Contributor

github-actions bot commented Jul 4, 2025

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jul 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 4, 2025

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dataroaring dataroaring merged commit e486dd3 into apache:branch-3.0 Jul 4, 2025
26 of 29 checks passed
@mrhhsg mrhhsg deleted the pick_52543_30 branch July 4, 2025 09:53
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jul 4, 2025
…apache#52543) (apache#52744)

Boost tokenizer requires explicit "." after "$" to correctly extract
JSON path tokens. Without this, expressions like "$[0].key" cannot be
properly split, causing issues in downstream logic. This commit ensures
a "." is automatically added after "$" to maintain consistent token
parsing behavior.

### What problem does this PR solve?

pick apache#52543

Issue Number: close #xxx

Related PR: apache#52543

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants