-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](inverted index) query_v2 add regexp, wildcard, phrase query #57007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
c13d1a2 to
3bdba88
Compare
|
run buildall |
ClickBench: Total hot run time: 27.7 s |
07d136b to
3e394e5
Compare
|
run buildall |
2 similar comments
|
run buildall |
|
run buildall |
7a0831f to
a327f71
Compare
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new query types to the inverted index query_v2 framework, specifically implementing regexp, wildcard, and phrase queries. The changes also include a rename from BitmapQuery to BitSetQuery for better clarity.
Key changes:
- Implementation of three new query types (regexp, wildcard, phrase) with their corresponding weight and scorer classes
- Refactoring to move common reader lookup logic to the base
Weightclass - Renaming of bitmap-related classes to bit_set for more accurate terminology
Reviewed Changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| wildcard_query_test.cpp | Comprehensive test suite for wildcard query functionality |
| regexp_query_test.cpp | Test cases for regular expression query operations |
| phrase_query_test.cpp | Tests for phrase query matching with position-based term matching |
| intersection_test.cpp | Tests for intersection operations on doc sets |
| boolean_query_test.cpp | Updated references from BitmapQuery to BitSetQuery |
| query_helper_test.cpp | Added mock method for multi-term similarity scoring |
| function_search.cpp | Updated comments and references to use BitSetQuery |
| vsearch.cpp | Updated comments referencing BitSetQuery |
| similarity.h | Added for_terms method for multi-term similarity calculation |
| bm25_similarity.h/cpp | Implementation of BM25 scoring for multiple terms |
| wildcard_weight.h/query.h | Wildcard query implementation converting wildcards to regex |
| regexp_weight.h/cpp/query.h | Regexp query with hyperscan pattern matching |
| phrase_weight.h/scorer.h/cpp/query.h | Phrase query with position-based matching |
| postings_with_offset.h | Helper class for position-aware postings |
| intersection.h/cpp | Generic intersection implementation for doc sets |
| doc_set.h | Added MockDocSet for testing and freq/norm methods |
| weight.h | Moved common reader lookup logic to base class |
| term_weight.h/term_scorer.h | Refactored to use base class reader lookup |
| segment_postings.h | Added position extraction methods and made freq/norm non-virtual |
| const_score_scorer.h | Const score wrapper for scorers |
| bit_set_query/* | Renamed from bitmap_query for clarity |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (_docs.empty()) { | ||
| _current_doc = TERMINATED; | ||
| } else { | ||
| std::ranges::sort(_docs.begin(), _docs.end()); |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::ranges::sort with .begin() and .end() iterators is incorrect. Either use std::ranges::sort(_docs) directly or use std::sort(_docs.begin(), _docs.end()).
| std::ranges::sort(_docs.begin(), _docs.end()); | |
| std::ranges::sort(_docs); |
| static_assert( | ||
| requires(TermIterator it) { | ||
| it->freq(); | ||
| it->nextPosition(); | ||
| }, "TermIterator must expose freq() and nextPosition()"); |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static_assert with a requires clause should use std::is_invocable or a proper C++20 concept. The current syntax mixing static_assert with requires expression may not compile correctly on all compilers.
TPC-DS: Total hot run time: 190511 ms |
ClickBench: Total hot run time: 27.48 s |
a327f71 to
96902e1
Compare
|
run buildall |
ClickBench: Total hot run time: 29.52 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
airborne12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
csun5285
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…#57007) Problem Summary: This PR adds new query types to the inverted index query_v2 framework, specifically implementing regexp, wildcard, and phrase queries. The changes also include a rename from BitmapQuery to BitSetQuery for better clarity. Key changes: Implementation of three new query types (regexp, wildcard, phrase) with their corresponding weight and scorer classes Refactoring to move common reader lookup logic to the base Weight class Renaming of bitmap-related classes to bit_set for more accurate terminology
…apache#57007) Problem Summary: This PR adds new query types to the inverted index query_v2 framework, specifically implementing regexp, wildcard, and phrase queries. The changes also include a rename from BitmapQuery to BitSetQuery for better clarity. Key changes: Implementation of three new query types (regexp, wildcard, phrase) with their corresponding weight and scorer classes Refactoring to move common reader lookup logic to the base Weight class Renaming of bitmap-related classes to bit_set for more accurate terminology
…fact some code (apache#57372) Related PR: apache#57007 Problem Summary: This PR enhances the search functionality by adding support for phrase queries, wildcard queries, and regex queries, while refactoring code to improve maintainability and ensure proper NULL semantics handling across all query types.
…apache#57007) Problem Summary: This PR adds new query types to the inverted index query_v2 framework, specifically implementing regexp, wildcard, and phrase queries. The changes also include a rename from BitmapQuery to BitSetQuery for better clarity. Key changes: Implementation of three new query types (regexp, wildcard, phrase) with their corresponding weight and scorer classes Refactoring to move common reader lookup logic to the base Weight class Renaming of bitmap-related classes to bit_set for more accurate terminology
…fact some code (apache#57372) Related PR: apache#57007 Problem Summary: This PR enhances the search functionality by adding support for phrase queries, wildcard queries, and regex queries, while refactoring code to improve maintainability and ensure proper NULL semantics handling across all query types.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
This PR adds new query types to the inverted index query_v2 framework, specifically implementing regexp, wildcard, and phrase queries. The changes also include a rename from BitmapQuery to BitSetQuery for better clarity.
Key changes:
Implementation of three new query types (regexp, wildcard, phrase) with their corresponding weight and scorer classes
Refactoring to move common reader lookup logic to the base Weight class
Renaming of bitmap-related classes to bit_set for more accurate terminology
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)