Prototype: Semantic search in ES|QL with query rewrite#118106
Closed
ioanatia wants to merge 1 commit intoelastic:mainfrom
Closed
Prototype: Semantic search in ES|QL with query rewrite#118106ioanatia wants to merge 1 commit intoelastic:mainfrom
ioanatia wants to merge 1 commit intoelastic:mainfrom
Conversation
Member
|
This LGTM. Having support for a coordinator rewrite phase will simplify the work for inference related tasks, and allow to reuse the work already being done for semantic_text related queries. |
Member
Author
|
can be closed since we merged #118676 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A prototype for an alternative to #116253
This allows using
matchonsemantic_textfields, but without the need to reimplement the logic for getting the inference results in ES|QL.This leverages the same model of query rewrite phase we already have in _search on the coordinator.
I added a
ChickenQueryBuilderto show that semantic search actually works, but what we actually want is to continue to useMatchQueryBuilderwhen thematchquery starts supportingsemantic_text(which is in progress).It also shows the two types of query rewrites that happen under the hood for semantic text, but those would not be directly exposed in ES|QL.
The nice thing about this approach is that we can reuse it when we add a
knnfunction and we want to do something similar to thequery_vector_builder:Here the argument of
knnis not a vector, but a text query with a specifiedmodel_idto transform the query text into a vector. We could use the same approach of having aQueryBuilderthat gets a rewrite phase on the coordinator to get the embeddings. Then at least for knn, we do not have a requirement to have the concept of async functions in ES|QL.While this PR is cutting a lot of corners I want to get some high level feedback on a few questions:
EsqlSession- is the a better place for it?FullTextFunctionwould store its ownQueryBuilderthat would get rewritten on the coordinator. I guess we want everyFullTextFunctionto control the function -> query builder translation, but storing theQueryBuilderin theFullTextFunctiontakes this idea a bit further than some might expectIf we agree with this high level approach, the plan would be to split this into multiple phases:
FullTextFunctionwill own their translation to Lucene queries andFullTextFunctioninstances will store their query builder.QueryBuilders forFullTextFunctions and rewrite theFullTextFunctionnodes with rewrittenQueryBuilders.MatchQueryBuildersupports semantic text, enable thematchfunction to receivesemantic_textfields to perform semantic search.