Skip to content

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Oct 5, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #56139

Problem Summary:
This PR adds EXACT DSL functionality to the search function, enabling exact string matching without tokenization. This feature complements existing ANY/ALL operators that work with tokenized indexes by providing strict string equality matching.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12 airborne12 requested a review from Copilot October 5, 2025 14:47
@airborne12
Copy link
Member Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds EXACT DSL functionality to the search function, enabling exact string matching without tokenization. This feature complements existing ANY/ALL operators that work with tokenized indexes by providing strict string equality matching.

Key changes:

  • Added EXACT clause type to search DSL grammar and parser
  • Updated backend function to handle EXACT queries using EQUAL_QUERY type
  • Extensive test coverage for various EXACT matching scenarios

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
SearchLexer.g4 Added EXACT_LPAREN token for lexical analysis
SearchParser.g4 Added exactValue rule to grammar
SearchDslParser.java Added EXACT clause type and parsing logic
function_search.cpp Added EXACT handling in backend query processing
test_search_exact_*.groovy Comprehensive regression tests for EXACT functionality
test_search_exact_*.out Expected test outputs for EXACT functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-DS: Total hot run time: 190001 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7f17a0aa362717b2c145427f3e2be29e7cf62e3b, data reload: false

query1	1066	435	410	410
query2	6553	1709	1657	1657
query3	6755	219	230	219
query4	26315	23344	23938	23344
query5	5183	644	511	511
query6	356	266	215	215
query7	4653	509	297	297
query8	290	254	247	247
query9	8757	2583	2581	2581
query10	537	346	284	284
query11	15403	15018	15689	15018
query12	199	125	122	122
query13	1673	606	447	447
query14	12282	9574	9434	9434
query15	244	199	198	198
query16	7804	682	553	553
query17	2126	817	649	649
query18	2302	433	344	344
query19	231	206	178	178
query20	133	136	134	134
query21	216	135	152	135
query22	4817	5118	4783	4783
query23	34815	34136	33374	33374
query24	8434	2407	2418	2407
query25	599	548	458	458
query26	1239	276	161	161
query27	2735	492	358	358
query28	4343	2147	2180	2147
query29	810	640	508	508
query30	297	233	200	200
query31	965	807	737	737
query32	78	74	68	68
query33	590	380	322	322
query34	792	836	532	532
query35	780	851	736	736
query36	971	999	924	924
query37	121	110	85	85
query38	3494	3494	3465	3465
query39	1476	1449	1393	1393
query40	214	125	115	115
query41	63	57	55	55
query42	120	105	119	105
query43	530	500	478	478
query44	1341	841	823	823
query45	184	179	173	173
query46	835	1000	626	626
query47	1768	1786	1736	1736
query48	382	420	316	316
query49	770	495	430	430
query50	651	696	407	407
query51	3970	3928	3978	3928
query52	118	114	104	104
query53	235	260	195	195
query54	601	589	522	522
query55	93	82	89	82
query56	331	313	320	313
query57	1188	1206	1114	1114
query58	291	284	287	284
query59	2561	2705	2581	2581
query60	346	345	334	334
query61	168	153	152	152
query62	800	725	653	653
query63	223	194	195	194
query64	4422	1211	849	849
query65	4054	3957	3975	3957
query66	1058	435	332	332
query67	15610	15271	15303	15271
query68	9622	933	597	597
query69	507	319	297	297
query70	1417	1296	1379	1296
query71	534	341	331	331
query72	5488	4854	4752	4752
query73	679	564	354	354
query74	8904	8889	8625	8625
query75	4565	3366	2893	2893
query76	3813	1176	740	740
query77	1018	402	305	305
query78	9543	9826	8908	8908
query79	4960	832	582	582
query80	729	575	506	506
query81	498	276	221	221
query82	396	163	139	139
query83	300	263	262	262
query84	304	119	91	91
query85	855	467	431	431
query86	344	319	276	276
query87	3773	3853	3726	3726
query88	2911	2289	2229	2229
query89	445	331	305	305
query90	2098	230	229	229
query91	169	172	136	136
query92	87	69	67	67
query93	3489	985	640	640
query94	705	441	339	339
query95	394	326	308	308
query96	491	595	290	290
query97	2877	2964	2867	2867
query98	236	220	205	205
query99	1421	1420	1290	1290
Total cold run time: 286933 ms
Total hot run time: 190001 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.5 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7f17a0aa362717b2c145427f3e2be29e7cf62e3b, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.06
query3	0.25	0.08	0.08
query4	1.62	0.12	0.12
query5	0.28	0.26	0.25
query6	1.19	0.67	0.64
query7	0.04	0.03	0.03
query8	0.06	0.04	0.04
query9	0.63	0.52	0.54
query10	0.58	0.58	0.58
query11	0.20	0.12	0.12
query12	0.16	0.12	0.11
query13	0.63	0.61	0.62
query14	1.02	1.04	1.01
query15	0.87	0.88	0.86
query16	0.41	0.40	0.42
query17	1.03	1.07	1.07
query18	0.23	0.20	0.20
query19	1.90	1.85	1.88
query20	0.02	0.01	0.01
query21	15.43	0.91	0.57
query22	0.77	1.21	0.73
query23	14.91	1.37	0.65
query24	6.95	0.75	1.28
query25	0.49	0.14	0.20
query26	0.63	0.16	0.12
query27	0.07	0.06	0.06
query28	10.01	1.36	0.94
query29	12.55	3.92	3.26
query30	0.28	0.13	0.11
query31	2.83	0.63	0.39
query32	3.24	0.58	0.49
query33	3.06	3.08	3.07
query34	16.13	5.49	4.82
query35	4.97	4.92	5.00
query36	0.69	0.53	0.50
query37	0.11	0.08	0.08
query38	0.07	0.05	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.14
query41	0.10	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 104.85 s
Total hot run time: 30.5 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 33.33% (3/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.49% (17709/33739)
Line Coverage 37.66% (160766/426876)
Region Coverage 32.16% (122836/381948)
Branch Coverage 33.55% (53857/160533)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 20.00% (1/5) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.13% (23513/33057)
Line Coverage 57.56% (245483/426445)
Region Coverage 52.69% (203786/386776)
Branch Coverage 54.47% (87888/161344)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 80.00% (4/5) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.13% (23513/33057)
Line Coverage 57.57% (245486/426445)
Region Coverage 52.69% (203774/386776)
Branch Coverage 54.47% (87889/161344)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 80.00% (4/5) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 6, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2025

PR approved by anyone and no changes requested.

Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@airborne12 airborne12 merged commit 300dc60 into apache:master Oct 6, 2025
27 of 29 checks passed
@airborne12 airborne12 deleted the feature-exact-search branch October 6, 2025 07:33
github-actions bot pushed a commit that referenced this pull request Oct 6, 2025
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #56139

Problem Summary:
This PR adds EXACT DSL functionality to the search function, enabling
exact string matching without tokenization. This feature complements
existing ANY/ALL operators that work with tokenized indexes by providing
strict string equality matching.
yiguolei pushed a commit that referenced this pull request Oct 6, 2025
#56711)

Cherry-picked from #56710

Co-authored-by: Jack <jiangkai@selectdb.com>
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Jan 7, 2026
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#56139

Problem Summary:
This PR adds EXACT DSL functionality to the search function, enabling
exact string matching without tokenization. This feature complements
existing ANY/ALL operators that work with tokenized indexes by providing
strict string equality matching.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants