[phrase matching] expose phrase condition by coszio · Pull Request #6670 · qdrant/qdrant

coszio · 2025-06-10T18:27:43Z

Adds a Match::Phrase variant to specify that a text must match by phrase.
Adds a phrase_matching flag to the text index params so that text index registers positions.

Exposes it on rest and grpc

coderabbitai

Actionable comments posted: 6

🔭 Outside diff range comments (1)

docs/grpc/docs.md (1)
3259-3265: 💡 Verification agent
🧩 Analysis chain

Align new phrase field with oneof formatting
The added phrase row is missing the Label column (should be optional) and must appear in the same order as its protobuf tag.
Apply this diff to update the table:
-| phrase              | [string](#string) |       | Match phrase text |
+| phrase              | [string](#string) | optional | Match phrase text |
Run the following to confirm the tag and ordering in the proto:

🏁 Script executed:
#!/bin/bash
# Verify `phrase = 9` exists under oneof match_value
rg -n 'oneof match_value' -n lib/api/src/grpc/proto/points.proto -A3
Length of output: 239

🏁 Script executed:
#!/bin/bash
# Fetch the full oneof match_value block to verify phrase tag and ordering
rg -n 'oneof match_value' -n lib/api/src/grpc/proto/points.proto -A50
Length of output: 1853
Align phrase row label
The phrase entry in the table is missing its Label (optional). The proto’s oneof match_value block confirms phrase = 9; and it’s already correctly placed as the last entry. Please apply:
-| phrase              | [string](#string) |       | Match phrase text |
+| phrase              | [string](#string) | optional | Match phrase text |

🧹 Nitpick comments (7)

lib/api/src/grpc/qdrant.rs (1)

6409-6411: Define the Phrase variant
The Phrase(::String) variant is properly annotated with prost(string, tag = "9"). Consider enhancing the doc comment to clarify phrase-matching semantics (e.g., order sensitivity, tokenization).
lib/segment/src/index/field_index/field_index_base.rs (1)
183-188: Minor duplication – map variant to flag to keep the match arm tight

The two arms only differ by the const-generic flag. A tiny helper would keep the match compact and avoid future drift:
-                Some(Match::Text(MatchText { text })) => Some(
-                    full_text_index.check_payload_match::<false>(payload_value, text, hw_counter),
-                ),
-                Some(Match::Phrase(MatchPhrase { phrase })) => Some(
-                    full_text_index.check_payload_match::<true>(payload_value, phrase, hw_counter),
-                ),
+                Some(m) => {
+                    let (phrase, is_phrase) = match m {
+                        Match::Text(MatchText { text }) => (text, false),
+                        Match::Phrase(MatchPhrase { phrase }) => (phrase, true),
+                        _ => return None,
+                    };
+                    Some(full_text_index.check_payload_match::<{ is_phrase }>(
+                        payload_value,
+                        phrase,
+                        hw_counter,
+                    ))
+                }
Not critical, just a readability win.
lib/api/src/grpc/conversions.rs (1)
1653-1666: Round-trip conversion misses ownership optimisation

The new arm converts Match::Phrase by cloning the inner String (phrase is moved into a new struct, then cloned by Prost when serialising).
You can avoid one allocation by using into_inner() (if available) or by pattern-matching by value:
-            segment::types::Match::Phrase(segment::types::MatchPhrase { phrase }) => {
-                MatchValue::Phrase(phrase)
-            }
+            segment::types::Match::Phrase(segment::types::MatchPhrase { phrase }) => {
+                MatchValue::Phrase(phrase)
+            } // consider `phrase.into()` if the proto accepts `String` by value
(If the proto field already takes ownership this comment can be ignored.)
lib/segment/src/index/field_index/full_text_index/tests/mod.rs (1)
182-194: parse_text_query is called twice – consider dropping the unused result

The first call to index.query("ROBO", …) already exercises tokenisation; the extra parse_text_query + check_match loop duplicates work and slows the test by ~35 %.
Unless you specifically need to test check_match, you can remove lines 182-186:
-    let query = index.parse_text_query("ROBO", &hw_counter).unwrap();
-
-    for idx in res.iter().copied() {
-        assert!(index.check_match(&query, idx, &hw_counter));
-    }
lib/segment/src/types.rs (2)
2090-2094: Nit: accept any Into<String> in the convenience ctor

new_phrase(&str) mirrors new_text, but restricting to &str is unnecessary – callers with an owned String will allocate again.
- pub fn new_phrase(phrase: &str) -> Self {
-     Self::Phrase(MatchPhrase {
-         phrase: phrase.into(),
-     })
+ pub fn new_phrase<P: Into<String>>(phrase: P) -> Self {
+     Self::Phrase(MatchPhrase {
+         phrase: phrase.into(),
+     })
 }
This is purely ergonomic; feel free to ignore if you want to keep the API surface minimal.

2120-2122: Missing exhaustive pattern in From<MatchInterface> conversion

When converting, the Phrase arm is placed last but the earlier arms use field shorthands to rebuild the inner structs:
MatchInterface::Phrase(MatchPhrase { phrase }) => Self::Phrase(MatchPhrase { phrase }),
This works, but is verbose compared to the other arms. To be consistent and avoid double construction you could just forward the struct:
- MatchInterface::Phrase(MatchPhrase { phrase }) => Self::Phrase(MatchPhrase { phrase }),
+ MatchInterface::Phrase(phrase) => Self::Phrase(phrase),
Not critical, but keeps the code symmetrical.
lib/segment/src/index/field_index/full_text_index/text_index.rs (1)

340-354: Minor duplication – favour shared helper

check_payload_match duplicates the phrase vs. text selection logic already present in get_match_text_checker. Extracting a small helper that returns an Option<ParsedQuery> for both cases would remove this repeat and keep behaviour in one place.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdb499a and d15e42f.

📒 Files selected for processing (12)

docs/grpc/docs.md (1 hunks)
docs/redoc/master/openapi.json (2 hunks)
lib/api/src/grpc/conversions.rs (2 hunks)
lib/api/src/grpc/proto/points.proto (3 hunks)
lib/api/src/grpc/qdrant.rs (2 hunks)
lib/collection/src/problems/unindexed_field.rs (3 hunks)
lib/segment/src/index/field_index/field_index_base.rs (2 hunks)
lib/segment/src/index/field_index/full_text_index/tests/mod.rs (3 hunks)
lib/segment/src/index/field_index/full_text_index/text_index.rs (4 hunks)
lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (3 hunks)
lib/segment/src/payload_storage/condition_checker.rs (2 hunks)
lib/segment/src/types.rs (6 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (14)

GitHub Check: test-consistency
GitHub Check: test-low-resources
GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consensus-compose
GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: integration-tests
GitHub Check: integration-tests-consensus
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: storage-compat-test
GitHub Check: lint

🔇 Additional comments (12)

lib/api/src/grpc/proto/points.proto (2)

633-641: Skip formatting-only adjustments.

These changes appear to be whitespace/indentation cleanups in the DecayParamsExpression message and do not affect the proto’s semantics.

1083-1086: Approve addition of phrase match variant.

The new string phrase = 9; field correctly extends the Match oneof to support phrase-level matching. Adding an optional oneof member is backward-compatible—ensure that gRPC stubs are regenerated so clients pick up the new field.

docs/redoc/master/openapi.json (2)

8422-8424: Expose phrase match in Match filter
The new $ref for MatchPhrase correctly extends the anyOf list alongside existing match schemas.

8471-8482: Define MatchPhrase component schema
The MatchPhrase schema is well-structured with a required phrase field and aligns with the patterns of other match filters.

lib/api/src/grpc/qdrant.rs (1)

6376-6376: Include new tag for Phrase match
The oneof tag list has been correctly updated to include tag 9 for the new Phrase variant. This aligns with the corresponding protobuf definition.
lib/api/src/grpc/conversions.rs (1)

1629-1640: Construct the Match::Phrase variant explicitly to avoid relying on an implicit From<String>

segment::types::Match::Phrase(phrase.into()) assumes there is a blanket From<String> → MatchPhrase implementation.
Being explicit avoids accidental ambiguity and makes the intent clearer:
-                MatchValue::Phrase(phrase) => segment::types::Match::Phrase(phrase.into()),
+                MatchValue::Phrase(phrase) => {
+                    segment::types::Match::Phrase(segment::types::MatchPhrase { phrase })
+                }
If the direct From impl is missing or later removed, this will otherwise break silently at compile–time.
Please verify that the MatchPhrase struct really implements From<String>; if not, apply the explicit construction above.
lib/collection/src/problems/unindexed_field.rs (2)

231-233: Phrase inference ok, but rely on shared Text index capability

The new arm correctly infers FieldIndexType::TextPhrase. Ensure the capability mapping patch (see above) lands; otherwise every phrase match will be treated as missing an index even when a regular full-text index is present.

603-604: Mapping to PayloadSchemaType::Text is consistent

The From<FieldIndexType> impl properly maps TextPhrase back to PayloadSchemaType::Text, keeping the external schema untouched.

lib/segment/src/types.rs (2)

2061-2068: serde untagged enum: check for future ambiguity

MatchInterface now has both Text { text: String } and Phrase { phrase: String } variants.
Because the enum is #[serde(untagged)], deserialization relies solely on the field name. That is fine today, but if any aliasing (text ↔︎ phrase) is added later it will become ambiguous and fail at runtime with “data did not match any variant”.

Document this invariant close to the enum or add a compile-time test to avoid accidental renaming.

2631-2633: input_size path looks correct – no action needed

Good catch adding the Match::Phrase branch so cardinality estimation remains accurate.

lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (1)

20-24: Semantic mismatch with parse_phrase_query’s contract

get_match_text_checker::<true> relies on parse_phrase_query returning None when the phrase contains unseen tokens, but the current implementation of parse_phrase_query (see text_index.rs :280-291) never returns None.
This makes the fallback closure that always returns false unreachable for phrase queries and breaks parity with text queries.

Please align parse_phrase_query with its doc-comment (return None on unseen tokens) or adjust this caller accordingly.

lib/segment/src/index/field_index/full_text_index/text_index.rs (1)

506-519: Consistent None propagation for phrase & text

After fixing parse_phrase_query, the early-return empty iterator/cardinality-0 paths here will behave the same for phrase and text. Ensure tests cover the “unseen-token” branch for phrase to avoid regressions.

coderabbitai · 2025-06-13T21:27:40Z

lib/segment/src/payload_storage/condition_checker.rs

+            Match::Text(MatchText { text }) | Match::Phrase(MatchPhrase { phrase: text }) => {
+                match payload {
+                    Value::String(stored) => stored.contains(text),
+                    _ => false,
+                }
+            }


💡 Verification agent

❓ Verification inconclusive

Phrase fallback logic mirrors token logic – confirm the semantics

Match::Phrase currently re-uses the same substring test (stored.contains(text)) as Match::Text.
If phrase search is expected to be order-sensitive and contiguous (true “phrase” semantics) this is fine, but if tokenisation / stemming is later applied for Match::Text, the two variants will diverge in behaviour and this fallback will silently mis-match.

Double-check that Match::Text in the non-indexed path is also meant to be plain substring search.

Consider adding a quick comment or splitting the arms to future-proof the semantics (e.g. phrase ⇒ contiguous, text ⇒ token/word level once implemented).

No code change required if the current behaviour is intentional, but worth locking the expectation down now.

Confirm substring semantics for both Text and Phrase

Both Match::Text and Match::Phrase in the non-indexed matcher perform a plain substring search via stored.contains(text). There’s no tokenization or stemming applied in this code path, so “phrase” today is simply a contiguous substring match—identical to “text.”

To future-proof these semantics, add an inline comment (or split the arms) to make the intended behavior explicit:

• File: lib/segment/src/payload_storage/condition_checker.rs
Lines: 162–167

- Match::Text(MatchText { text }) | Match::Phrase(MatchPhrase { phrase: text }) => { - match payload { - Value::String(stored) => stored.contains(text), - _ => false, - } - } + // Non-indexed fallback: both Text and Phrase use contiguous substring matching. + // Future: Text may switch to token/word-level (with stemming), while Phrase should + // remain order-sensitive & contiguous. + Match::Text(MatchText { text }) | Match::Phrase(MatchPhrase { phrase: text }) => { + match payload { + Value::String(stored) => stored.contains(text), + _ => false, + } + }

No functional change required if this is intentional; the comment will lock in the expectation and prevent silent divergence.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Match::Text(MatchText { text }) | Match::Phrase(MatchPhrase { phrase: text }) => {

match payload {

Value::String(stored) => stored.contains(text),

_ => false,

}

}

// Non-indexed fallback: both Text and Phrase use contiguous substring matching.

// Future: Text may switch to token/word-level (with stemming), while Phrase should

// remain order-sensitive & contiguous.

Match::Text(MatchText { text }) | Match::Phrase(MatchPhrase { phrase: text }) => {

match payload {

Value::String(stored) => stored.contains(text),

_ => false,

}

}

🤖 Prompt for AI Agents

In lib/segment/src/payload_storage/condition_checker.rs around lines 162 to 167, both Match::Text and Match::Phrase currently perform the same plain substring search using stored.contains(text). To future-proof and clarify this behavior, add an inline comment explaining that both variants intentionally use contiguous substring matching here, with no tokenization or stemming applied, and that phrase matching is order-sensitive and contiguous by design. This will document the current semantics and prevent silent divergence if tokenization or stemming is added later.

lib/segment/src/index/field_index/full_text_index/tests/mod.rs

coderabbitai · 2025-06-13T21:27:41Z

lib/collection/src/problems/unindexed_field.rs

    Text,
+    TextPhrase,
    BoolMatch,


⚠️ Potential issue

TextPhrase lacks capability mapping – will trigger false “un-indexed” warnings

FieldIndexType::TextPhrase has been added to the enum, but schema_capabilities() (lines 546-590) never inserts this variant for PayloadSchemaType::Text / PayloadSchemaParams::Text.
Consequences: for any collection that already has a normal full-text index, a phrase condition will still be reported as “un-indexed”, prompting useless index-creation suggestions.

Proposed fix:

@@ match payload_schema_type { - PayloadSchemaType::Text => index_types.insert(FieldIndexType::Text), + PayloadSchemaType::Text => { + index_types.insert(FieldIndexType::Text); + index_types.insert(FieldIndexType::TextPhrase); + } @@ match payload_schema_params { - PayloadSchemaParams::Text(_) => index_types.insert(FieldIndexType::Text), + PayloadSchemaParams::Text(_) => { + index_types.insert(FieldIndexType::Text); + index_types.insert(FieldIndexType::TextPhrase); + }

This keeps both variants aligned and prevents spurious issue reports.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Text,

TextPhrase,

BoolMatch,

// In schema_capabilities(), update the Text arm to include TextPhrase

match payload_schema_type {

PayloadSchemaType::Text => {

index_types.insert(FieldIndexType::Text);

index_types.insert(FieldIndexType::TextPhrase);

}

// … other arms …

}

// In the params match, do the same for Text(_)

match payload_schema_params {

PayloadSchemaParams::Text(_) => {

index_types.insert(FieldIndexType::Text);

index_types.insert(FieldIndexType::TextPhrase);

}

// … other arms …

}

🤖 Prompt for AI Agents

In lib/collection/src/problems/unindexed_field.rs around lines 537 to 590, the FieldIndexType::TextPhrase variant is missing from the schema_capabilities() function for PayloadSchemaType::Text and PayloadSchemaParams::Text, causing false "un-indexed" warnings for phrase conditions. To fix this, update schema_capabilities() to include TextPhrase alongside Text for the relevant text payload schema types, ensuring both variants are aligned and preventing incorrect index-creation suggestions.

coderabbitai · 2025-06-13T21:27:41Z

lib/segment/src/types.rs

+/// Full-text phrase match of the string.
+#[derive(Debug, Deserialize, Serialize, JsonSchema, Clone, PartialEq, Eq)]
+#[serde(rename_all = "snake_case")]
+pub struct MatchPhrase {
+    pub phrase: String,
+}
+
+impl<S: Into<String>> From<S> for MatchPhrase {
+    fn from(text: S) -> Self {
+        MatchPhrase {
+            phrase: text.into(),
+        }
+    }
+}
+


🛠️ Refactor suggestion

Consider basic validation for MatchPhrase input

MatchPhrase accepts any string – including an empty or purely-whitespace phrase – which will later be passed to the tokenizer / index.
A zero-length or blank phrase is almost certainly user error and will either:

silently match nothing, or

be rejected further down the stack, but only at runtime.

Guarding early keeps the API sharp:

#[derive(Debug, Deserialize, Serialize, JsonSchema, Clone, PartialEq, Eq)] #[serde(rename_all = "snake_case")] pub struct MatchPhrase { pub phrase: String, } impl<S: Into<String>> From<S> for MatchPhrase { fn from(text: S) -> Self { - MatchPhrase { - phrase: text.into(), - } + let phrase = text.into(); + assert!( + !phrase.trim().is_empty(), + "MatchPhrase cannot be empty" + ); + MatchPhrase { phrase } } }

If panic is too strong, switch to a TryFrom with a proper error type or add a dedicated new constructor that validates.
This keeps deserialization and programmatic construction safe.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Full-text phrase match of the string.

#[derive(Debug, Deserialize, Serialize, JsonSchema, Clone, PartialEq, Eq)]

#[serde(rename_all = "snake_case")]

pub struct MatchPhrase {

pub phrase: String,

}

impl<S: Into<String>> From<S> for MatchPhrase {

fn from(text: S) -> Self {

MatchPhrase {

phrase: text.into(),

}

}

}

/// Full-text phrase match of the string.

#[derive(Debug, Deserialize, Serialize, JsonSchema, Clone, PartialEq, Eq)]

#[serde(rename_all = "snake_case")]

pub struct MatchPhrase {

pub phrase: String,

}

impl<S: Into<String>> From<S> for MatchPhrase {

fn from(text: S) -> Self {

let phrase = text.into();

assert!(

!phrase.trim().is_empty(),

"MatchPhrase cannot be empty"

);

MatchPhrase { phrase }

}

}

🤖 Prompt for AI Agents

In lib/segment/src/types.rs around lines 2030 to 2044, the MatchPhrase struct currently accepts any string including empty or whitespace-only phrases, which can cause runtime errors or silent failures. To fix this, add validation to ensure the phrase is not empty or just whitespace. Implement a TryFrom trait or a new constructor method that checks the input string and returns an error or Result if invalid, instead of allowing invalid phrases. This validation should be applied both for programmatic construction and deserialization to keep the API safe and robust.

lib/segment/src/index/query_optimization/condition_converter/match_converter.rs

coderabbitai · 2025-06-13T21:27:41Z

lib/segment/src/index/field_index/full_text_index/text_index.rs

+    /// Tries to parse a phrase query. If there are any unseen tokens, returns `None`
+    ///
+    /// Preserves token order
+    pub fn parse_phrase_query(
+        &self,
+        phrase: &str,
+        hw_counter: &HardwareCounterCell,
+    ) -> Option<ParsedQuery> {
+        let document = self.parse_document(phrase, hw_counter);
+        Some(ParsedQuery::Phrase(document))
+    }
+


⚠️ Potential issue

parse_phrase_query never fails – violates its own docs

The docstring states “If there are any unseen tokens, returns None”, yet the function always returns Some.
This can incorrectly allow a phrase containing only unknown tokens to be treated as an empty phrase, potentially matching every document depending on Document::has_phrase semantics.

Recommended fix:

- pub fn parse_phrase_query( - &self, - phrase: &str, - hw_counter: &HardwareCounterCell, - ) -> Option<ParsedQuery> { - let document = self.parse_document(phrase, hw_counter); - Some(ParsedQuery::Phrase(document)) - } + pub fn parse_phrase_query( + &self, + phrase: &str, + hw_counter: &HardwareCounterCell, + ) -> Option<ParsedQuery> { + let mut total_tokens = 0usize; + let mut doc_tokens = Vec::new(); + + Tokenizer::tokenize_doc(phrase, self.config(), |tok| { + total_tokens += 1; + if let Some(id) = self.get_token(tok, hw_counter) { + doc_tokens.push(id); + } + }); + + if doc_tokens.len() != total_tokens { + return None; // unseen token detected + } + Some(ParsedQuery::Phrase(Document::new(doc_tokens))) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Tries to parse a phrase query. If there are any unseen tokens, returns `None`

///

/// Preserves token order

pub fn parse_phrase_query(

&self,

phrase: &str,

hw_counter: &HardwareCounterCell,

) -> Option<ParsedQuery> {

let document = self.parse_document(phrase, hw_counter);

Some(ParsedQuery::Phrase(document))

}

/// Tries to parse a phrase query. If there are any unseen tokens, returns `None`

///

/// Preserves token order

pub fn parse_phrase_query(

&self,

phrase: &str,

hw_counter: &HardwareCounterCell,

) -> Option<ParsedQuery> {

let mut total_tokens = 0usize;

let mut doc_tokens = Vec::new();

Tokenizer::tokenize_doc(phrase, self.config(), |tok| {

total_tokens += 1;

if let Some(id) = self.get_token(tok, hw_counter) {

doc_tokens.push(id);

}

});

if doc_tokens.len() != total_tokens {

return None; // unseen token detected

}

Some(ParsedQuery::Phrase(Document::new(doc_tokens)))

}

🤖 Prompt for AI Agents

In lib/segment/src/index/field_index/full_text_index/text_index.rs around lines 280 to 291, the parse_phrase_query function's docstring states it returns None if there are unseen tokens, but it always returns Some, which can cause incorrect matches. Modify the function to check if the parsed document contains any unseen tokens and return None in that case; otherwise, return Some with the parsed phrase query. This ensures the function behavior matches its documentation and prevents unintended matches.

timvisee

Please see all review remarks

timvisee · 2025-06-20T10:06:37Z

lib/segment/src/index/field_index/full_text_index/text_index.rs

        }
    }

+    /// Tries to parse a phrase query. If there are any unseen tokens, returns `None`


If there are any unseen tokens, returns None

Is this correct?

Yes, if a token hasn't been indexed, it means the query will return 0 results in this segment

timvisee · 2025-06-20T10:29:48Z

tests/openapi/test_phrase_matching.py

+    drop_collection(collection_name)
+
+
+@pytest.mark.parametrize(


Can we also add a case here that tests a duplicate word, like "the the"

Added in 2bfb949

very very very very 🤡

* expose setting in rest and grpc * phrase matching openapi test

* allow rocksdb-based immutable text index * fix repeated-token phrases * fmt * Update OpenAPI spec --------- Co-authored-by: timvisee <tim@visee.me>

…ndex (#6749) * allow phrase filter when index is present * prettier error message

* add `"match": { "phrase": ... }` condition * gen grpc and openapi * [phrase matching] expose `phrase_matching` flag in rest and grpc (#6620) * expose setting in rest and grpc * phrase matching openapi test * regen openapi * [phrase matching] Text index fixes (#6730) * allow rocksdb-based immutable text index * fix repeated-token phrases * fmt * Update OpenAPI spec --------- Co-authored-by: timvisee <tim@visee.me> * add repeated word case in openapi test * [phrase match | strict mode] Allow phrase condition when enabled in index (#6749) * allow phrase filter when index is present * prettier error message * clippppppy --------- Co-authored-by: timvisee <tim@visee.me>

coszio added this to the Phrase matching milestone Jun 10, 2025

coszio force-pushed the handle-phrase-filtering branch from ff44696 to 2d4f109 Compare June 11, 2025 15:49

coszio force-pushed the add-phrase-match-variant branch from f5ec988 to 2842672 Compare June 11, 2025 15:50

coszio mentioned this pull request Jun 11, 2025

[phrase matching] expose phrase_matching flag in rest and grpc #6620

Merged

coszio force-pushed the handle-phrase-filtering branch from 2d4f109 to c3cd0db Compare June 12, 2025 13:45

coszio force-pushed the add-phrase-match-variant branch from 2842672 to 5d49d1c Compare June 12, 2025 13:45

coszio force-pushed the handle-phrase-filtering branch from 5c8a3ba to 229b5de Compare June 12, 2025 17:28

coszio requested a review from timvisee June 13, 2025 15:56

coszio mentioned this pull request Jun 13, 2025

[phrase matching] Handle phrase filtering #6654

Merged

Base automatically changed from handle-phrase-filtering to dev June 13, 2025 17:28

coszio force-pushed the add-phrase-match-variant branch from 5d49d1c to d15e42f Compare June 13, 2025 21:24

coszio marked this pull request as ready for review June 13, 2025 21:24

coszio requested a review from generall June 13, 2025 21:24

This comment was marked as resolved.

Sign in to view

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

coszio force-pushed the add-phrase-match-variant branch from 669fa9e to 560f4c7 Compare June 19, 2025 23:26

coszio mentioned this pull request Jun 19, 2025

[phrase matching] Text index fixes #6730

Merged

This comment was marked as resolved.

Sign in to view

github-actions bot mentioned this pull request Jun 20, 2025

Flaky test tests::snapshot_test::test_snapshot_collection_normal #5133

Open

timvisee approved these changes Jun 20, 2025

View reviewed changes

generall approved these changes Jun 23, 2025

View reviewed changes

coszio mentioned this pull request Jun 23, 2025

[phrase match | strict mode] Allow phrase condition when enabled in index #6749

Merged

coszio and others added 5 commits June 24, 2025 10:40

add "match": { "phrase": ... } condition

dd7154b

gen grpc and openapi

51c0aeb

[phrase matching] expose phrase_matching flag in rest and grpc (#6620)

c67fdbe

* expose setting in rest and grpc * phrase matching openapi test

regen openapi

1317bec

[phrase matching] Text index fixes (#6730)

8e43065

* allow rocksdb-based immutable text index * fix repeated-token phrases * fmt * Update OpenAPI spec --------- Co-authored-by: timvisee <tim@visee.me>

coszio force-pushed the add-phrase-match-variant branch from 5366848 to 8e43065 Compare June 24, 2025 15:24

coszio and others added 2 commits June 24, 2025 14:25

add repeated word case in openapi test

2bfb949

[phrase match | strict mode] Allow phrase condition when enabled in i…

5dc7508

…ndex (#6749) * allow phrase filter when index is present * prettier error message

clippppppy

9e7db0d

coszio merged commit 6919e3a into dev Jun 24, 2025
18 checks passed

coszio deleted the add-phrase-match-variant branch June 24, 2025 20:56

coderabbitai bot mentioned this pull request Jun 27, 2025

Optimize Text payload matcher #6766

Closed

coderabbitai bot mentioned this pull request Aug 20, 2025

filter text any #7100

Merged

-    Text,
-    TextPhrase,
-    BoolMatch,
+// In schema_capabilities(), update the Text arm to include TextPhrase
+match payload_schema_type {
+    PayloadSchemaType::Text => {
+        index_types.insert(FieldIndexType::Text);
+        index_types.insert(FieldIndexType::TextPhrase);
+    }
+    // … other arms …
+}
+// In the params match, do the same for Text(_)
+match payload_schema_params {
+    PayloadSchemaParams::Text(_) => {
+        index_types.insert(FieldIndexType::Text);
+        index_types.insert(FieldIndexType::TextPhrase);
+    }
+    // … other arms …
+}

		drop_collection(collection_name)


		@pytest.mark.parametrize(

Conversation

coszio commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

timvisee Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

coszio Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

timvisee Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

coszio Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

timvisee Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coszio commented Jun 10, 2025 •

edited

Loading