Documentation for semantic_text auto pre-filtering by dimitris-athanasiou · Pull Request #139749 · elastic/elasticsearch

dimitris-athanasiou · 2025-12-18T14:03:48Z

Adds documentation for automatic pre-filtering that was introduced in #138989.

Adds documentation for automatic pre-filtering that was introduced in elastic#138989.

elasticsearchmachine · 2025-12-18T14:04:14Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

elasticsearchmachine · 2025-12-18T14:04:14Z

Pinging @elastic/core-docs (Team:Docs)

dimitris-athanasiou · 2025-12-18T14:04:54Z

docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md

+% TEST[skip:Requires {{infer}} endpoint]
+
+
+The `term` query will be applied as a pre-filter, meaning that when the *knn* search executes on


I'm making the assumption that people reading this understand that a match query against a dense semantic_text field is doing knn search under the hood. I wonder if we need to add more explanation here or not.

I think so. Maybe we don't need to get technical - we can just say that in case it's needed, it will be applied as a pre-filter so we keep the expected number of results back.

dimitris-athanasiou · 2025-12-18T14:05:24Z

docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md

 * `semantic_text` fields do not support [Cross-Cluster Search (CCS)](docs-content://explore-analyze/cross-cluster-search.md) in [ES|QL](/reference/query-languages/esql.md).
 * `semantic_text` fields do not support [Cross-Cluster Replication (CCR)](docs-content://deploy-manage/tools/cross-cluster-replication.md).
+* automatic pre-filtering in Query DSL does not apply on [Nested queries](/reference/query-languages/query-dsl/query-dsl-nested-query.md). Such queries will be applied as post-filters.
+* automatic pre-filtering in ES|QL does not apply on filters that are not translatable to Lucene. Such filters will be applied as post-filters.


not translatable to Lucene this is tricky. @carlosdelest Any suggestions on how to phrase this better?

Tricky indeed. I think we could add something like:

Suggested change

* automatic pre-filtering in ES|QL does not apply on filters that are not translatable to Lucene. Such filters will be applied as post-filters.

* automatic pre-filtering in ES|QL does not apply on filters that use functions (like `WHERE TO_LOWER(my_field) == 'a'`). These filters will be applied as post-filters.

However, this is something we need to do. I've opened #139754 to track this.

github-actions · 2025-12-18T14:13:11Z

🔍 Preview links for changed docs

github-actions · 2025-12-18T14:13:12Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

carlosdelest

Looks good, thanks for documenting this!

I've added some suggestions, feel free to reject them

carlosdelest · 2025-12-18T14:15:33Z

docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md

+
+Querying `semantic_text` fields that have dense vector embeddings automatically applies
+filters found in the Query DSL tree or [ES|QL](/reference/query-languages/esql.md) query
+as pre-filters in order to ensure the requested number of results is returned.


Let's add a link to prefiltering (https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-knn-query#knn-query-filtering)

carlosdelest · 2025-12-18T14:23:02Z

docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md

+% TEST[skip:Requires {{infer}} endpoint]
+
+
+The `term` query will be applied as a pre-filter, meaning that when the *knn* search executes on


I think so. Maybe we don't need to get technical - we can just say that in case it's needed, it will be applied as a pre-filter so we keep the expected number of results back.

carlosdelest · 2025-12-18T14:26:12Z

docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md

+The `term` query will be applied as a pre-filter, meaning that when the *knn* search executes on
+`dense_semantic_text_field`, only documents that matched the `term` query will be searched.
+
+If the `term` query was applied as a post-filter, which is the default behavior for such filters,
+the *knn* search would execute against all documents, and then the `term` query would filter out
+documents that did not match. This could mean that fewer than 10 documents are returned if there
+are more relevant documents that are not green.


Maybe something similar to:

Suggested change

The `term` query will be applied as a pre-filter, meaning that when the *knn* search executes on

`dense_semantic_text_field`, only documents that matched the `term` query will be searched.

If the `term` query was applied as a post-filter, which is the default behavior for such filters,

the *knn* search would execute against all documents, and then the `term` query would filter out

documents that did not match. This could mean that fewer than 10 documents are returned if there

are more relevant documents that are not green.

In case the semantic_text uses dense_vector field embeddings, then the corresponding *knn* search executed on it will apply the term query as a pre-filter.

This allows to retrieve as many results as specified by the query.

If the `term` query was applied as a post-filter, which is the default behavior for such filters, the *knn* search would execute against all documents, and then the `term` query would filter out documents that did not match.

This could mean that fewer than 10 documents are returned if there are more relevant documents that are not green.

carlosdelest · 2025-12-18T14:27:08Z

docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md

+
+::::{note}
+The queries in Query DSL that are used as pre-filters to `semantic_text` queries are all `must`,
+ `filter`, and `must_not` queries that are within parent `bool` queries.


Suggested change

`filter`, and `must_not` queries that are within parent `bool` queries.

`filter`, and `must_not` queries that are included in the parent `bool` queries.

carlosdelest · 2025-12-18T14:43:57Z

docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md

 * `semantic_text` fields do not support [Cross-Cluster Search (CCS)](docs-content://explore-analyze/cross-cluster-search.md) in [ES|QL](/reference/query-languages/esql.md).
 * `semantic_text` fields do not support [Cross-Cluster Replication (CCR)](docs-content://deploy-manage/tools/cross-cluster-replication.md).
+* automatic pre-filtering in Query DSL does not apply on [Nested queries](/reference/query-languages/query-dsl/query-dsl-nested-query.md). Such queries will be applied as post-filters.
+* automatic pre-filtering in ES|QL does not apply on filters that are not translatable to Lucene. Such filters will be applied as post-filters.


Tricky indeed. I think we could add something like:

Suggested change

* automatic pre-filtering in ES|QL does not apply on filters that are not translatable to Lucene. Such filters will be applied as post-filters.

* automatic pre-filtering in ES|QL does not apply on filters that use functions (like `WHERE TO_LOWER(my_field) == 'a'`). These filters will be applied as post-filters.

However, this is something we need to do. I've opened #139754 to track this.

dimitris-athanasiou · 2025-12-19T08:19:02Z

@carlosdelest I have updated the PR taking into consideration your suggestions.

carlosdelest

LGTM from my side! 💯

It would be great to have some docs review on this one 👍

Changes: - **Title**: "Pre-filtering for dense vector queries" - **Rewrote opening**: One clear sentence about automatic pre-filtering with link - **Unified example intro**: Sets up both Query DSL and ES|QL examples - **Created subsections**: "Query DSL example" and "ES|QL example" - **Integrated note into prose**: `must`, `filter`, `must_not` explanation now in main text - **Moved kNN caveat**: Now an "important" block after Query DSL example - **Added MATCH footnote**: Explains automatic kNN behavior

* upstream/main: (25 commits) Add spec for project routing CRUD REST API endpoints (elastic#139634) Implement AllSupportedFIeldsTestCase for TDigest (elastic#139744) Mute elastic#139802 (elastic#139803) fix(logsdb): batch bulk indexing to prevent OOM in challenge tests (elastic#139770) Documentation for semantic_text auto pre-filtering (elastic#139749) Always do bulk scoring for rescoring when possible (elastic#139777) Optimize script sorts that do not require query scores (elastic#139748) Bump versions after 9.1.9 release Update branches.json for 9.1.9 release Bump versions after 9.2.3 release Prune changelogs after 8.19.9 release Bump versions after 8.19.9 release Update branches.json for 8.19.9 release Finalize docs for v9.2.3 release (elastic#139795) ESQL: Added timezone support to date_format and date_parse (elastic#138517) Update branches.json for 9.2.3 release Finalize docs for v9.1.9 release (elastic#139796) Switch inline stats to GA in docs (elastic#139753) Validate license in CPS (elastic#139105) FIPS 140-3 support with BC FIPS 2.0.x (elastic#139319) ...

Adds documentation for semantic_text auto pre-filtering

5b28949

Adds documentation for automatic pre-filtering that was introduced in elastic#138989.

dimitris-athanasiou requested review from carlosdelest and leemthompo December 18, 2025 14:03

dimitris-athanasiou added >docs General docs changes :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.3.0 labels Dec 18, 2025

elasticsearchmachine added v9.4.0 Team:Docs Meta label for docs team Team:Search - Relevance The Search organization Search Relevance team labels Dec 18, 2025

dimitris-athanasiou commented Dec 18, 2025

View reviewed changes

dimitris-athanasiou changed the title ~~Adds documentation for semantic_text auto pre-filtering~~ Documentation for semantic_text auto pre-filtering Dec 18, 2025

Merge branch 'main' into docs-auto-prefiltering

1d67dae

carlosdelest approved these changes Dec 18, 2025

View reviewed changes

Address review feedback

121b94c

carlosdelest approved these changes Dec 19, 2025

View reviewed changes

leemthompo added 3 commits December 19, 2025 11:54

add annotation to query dsl example too

5c36078

fix annotations syntax

230d4b9

leemthompo approved these changes Dec 19, 2025

View reviewed changes

dimitris-athanasiou merged commit 5d9ed4d into elastic:main Dec 19, 2025
12 checks passed

dimitris-athanasiou deleted the docs-auto-prefiltering branch December 19, 2025 12:09

dimitris-athanasiou removed the v9.3.0 label Dec 19, 2025

dimitris-athanasiou mentioned this pull request Dec 22, 2025

[ES|QL] Semantic (dense) MATCH queries with filters that aren't translatable to Lucene apply post knn query #139453

Open

		% TEST[skip:Requires {{infer}} endpoint]


		The `term` query will be applied as a pre-filter, meaning that when the knn search executes on

	* automatic pre-filtering in ES\|QL does not apply on filters that are not translatable to Lucene. Such filters will be applied as post-filters.
	* automatic pre-filtering in ES\|QL does not apply on filters that use functions (like `WHERE TO_LOWER(my_field) == 'a'`). These filters will be applied as post-filters.

	`filter`, and `must_not` queries that are within parent `bool` queries.
	`filter`, and `must_not` queries that are included in the parent `bool` queries.

Conversation

dimitris-athanasiou commented Dec 18, 2025

Uh oh!

elasticsearchmachine commented Dec 18, 2025

Uh oh!

elasticsearchmachine commented Dec 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Dec 18, 2025

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou commented Dec 19, 2025

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 18, 2025 •

edited

Loading