Semantic search with query builder rewrite by ioanatia · Pull Request #118676 · elastic/elasticsearch

ioanatia · 2024-12-13T14:57:38Z

tracked in #115103

actual implementation of the prototype from #118106

This adds a query builder rewrite phase on the coordinator to ES|QL.
Since the MatchQueryBuilder now supports semantic_text (#117839), with the new query builder rewrite phase it becomes very easy to add support for querying semantic_text in ES|QL.

Still needs more tests - for now I added CSV tests (for positive cases) and integration tests to show how we capture and return errors that can happen during the query builder rewrite.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/QueryBuilderResolver.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

…rite

elasticsearchmachine · 2024-12-16T16:00:59Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

ChrisHegarty

LGTM

carlosdelest · 2024-12-16T17:06:40Z

...esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/SemanticMatchTestCase.java

+        assertEquals(404, re.getResponse().getStatusLine().getStatusCode());
+    }
+
+    @Before


Would @BeforeClass / @AfterClass be better in the setup / teardown methods as no data is modified?

I am having a bit of trouble using AfterClass/BeforeClass even if I switch to adminClient():

SemanticMatchIT > classMethod FAILED java.lang.NullPointerException: Cannot invoke "org.elasticsearch.client.RestClient.performRequest(org.elasticsearch.client.Request)" because the return value of "org.elasticsearch.xpack.esql.qa.rest.SemanticMatchTestCase.adminClient()" is null at __randomizedtesting.SeedInfo.seed([9885A2F1F0E88028]:0) at org.elasticsearch.xpack.esql.qa.rest.SemanticMatchTestCase.wipeData(SemanticMatchTestCase.java:105) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:909) at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:48)

I see that most integration tests use @Before and @After, so I will revert to that.

x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-function.csv-spec

carlosdelest · 2024-12-16T17:13:36Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/scoring.csv-spec

+;
+
+_id:keyword | _score:double
+2           | 1.2879333961116942E19


I wonder if score should be checked via YAML tests:

It would allow to check that it's just greater than zero

We could check if it's the same score as returned by an equivalent search

No need to do anything on this PR, just thinking out loud 🙂

Yes - I am not sure either - for now I followed the pattern we already have for testing with _score.
Happy to change this, but not as part of this PR.
I think it's a bit more complicated than that - if we could just make sure we use the same sharding strategy consistently in the scoring tests, this shouldn't be a problem anymore.
The sharding does not only affect the value of the scores, but as we have seen from some of the failing tests, it can also affect the order of the documents, with some document getting scored higher than other depending on the sharding strategy. So to fix this scoring tests, it is not sufficient to just check that the scores are greater than 0.

Happy to change this, but not as part of this PR.

++

to fix this scoring tests, it is not sufficient to just check that the scores are greater than 0.

What worked for me to test knn rescoring is to get the results from an actual search and compare them directly to the ones returned from another, both scoring and the results themselves. We can check on some follow up on the strategy

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/QueryBuilderResolver.java

carlosdelest · 2024-12-16T17:29:14Z

...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

        // Check for every possible query data type
        for (DataType fieldDataType : fieldDataTypes) {
+            // TODO: semantic_text is not present in mapping-all-types.json so we skip it for now
+            if (fieldDataType == DataType.SEMANTIC_TEXT) {


Why not add it there?

because I have not fully mapped how mapping-all-types.json is being used - I will follow up on this separately.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/QueryBuilderResolver.java

fang-xing-esql · 2024-12-17T04:26:55Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/QueryBuilderResolver.java

+            callback.accept(plan, listener);
+            return;
+        }
+        QueryRewriteContext ctx = queryRewriteContext(indexNames);


I wonder if QueryRewriteContext an intermediate data structure to carry ResolvedIndices to get the inference results? Can we just provide ResolvedIndices instead, without having to create a QueryRewriteContext?

I am not sure I am following - we need a QueryRewriteContext, because what Rewritable.rewriteAndFetch does (among other things) is to call the rewrite method from QueryBuilder which needs a QueryRewriteContext:

elasticsearch/server/src/main/java/org/elasticsearch/index/query/QueryBuilder.java

Lines 62 to 69 in 8134c79

/**

* Rewrites this query builder into its primitive form. By default this method return the builder itself. If the builder

* did not change the identity reference must be returned otherwise the builder will be rewritten infinitely.

*/

@Override

default QueryBuilder rewrite(QueryRewriteContext queryRewriteContext) throws IOException {

return this;

}

Many implementations of the QueryBuilder interface will then have their own rewrite logic.

carlosdelest

LGTM 💯

…rite

ioanatia · 2024-12-18T09:07:05Z

...gin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/KqlFunctionIT.java

    }
+
+    @Override
+    protected Collection<Class<? extends Plugin>> nodePlugins() {


The KqlFunctionIT tests would fail with:

java.lang.AssertionError: Unknown NamedWriteable [org.elasticsearch.index.query.QueryBuilder][kql] | -- | -- | at __randomizedtesting.SeedInfo.seed([7E86655D8AA148E0]:0) | | at org.elasticsearch.common.io.stream.NamedWriteableRegistry.throwOnUnknownWritable(NamedWriteableRegistry.java:150) | | at org.elasticsearch.common.io.stream.NamedWriteableRegistry.getReader(NamedWriteableRegistry.java:126) | | at org.elasticsearch.common.io.stream.NamedWriteableRegistry.getReader(NamedWriteableRegistry.java:109)

this is because we needed to explicitly require the KqlPlugin so that the named writable for the KqlQueryBuilder is registered.
note that this was just a nit for this particular test (and other ES|QL integration tests also override this method), and that single and multi node tests for KQL continue to work with no required changes.

…rite

ioanatia · 2024-12-18T10:21:33Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/scoring.csv-spec

+| keep _id
+;
+
+_id:keyword


I dropped the score column for now because I am getting different values for single vs multi node tests 🤷‍♀️ .
It's a separate problem which we'll need to look into.

Example:

Actual: | ----------------------------> these are the values generated with single node -- | -- | _id:keyword \| _score:double | | 2 \| 5.603396578413904E18 | | 3 \| 2.156063961865257E18 | | 1 \| 8.411017936759685E17 | | | | Expected: | -------------------------------> these are the generated values for multi node | _id:keyword \| _score:double | | 2 \| 5.603396E18 | | 3 \| 2.156063E18 | | 1 \| 8.411017E17

* Semantic search with query builder rewrite * Address review feedback * Add feature behind snapshot * Use after/before instead of afterClass/beforeClass * Call onFailure instead of throwing exception * Fix KqlFunctionIT by requiring KqlPlugin * Update scoring tests now that they are enabled * Drop the score column for now

Semantic search with query builder rewrite

c434c06

ioanatia added >non-issue :Analytics/ES|QL AKA ESQL v9.0.0 labels Dec 13, 2024

ioanatia mentioned this pull request Dec 13, 2024

semantic search in ES|QL #116253

Closed

2 tasks

ChrisHegarty reviewed Dec 13, 2024

View reviewed changes

ioanatia added 2 commits December 16, 2024 15:01

Address review feedback

1e993ae

Merge remote-tracking branch 'elasticsearch/main' into esql_query_rew…

e98d470

…rite

ioanatia marked this pull request as ready for review December 16, 2024 16:00

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 16, 2024

ioanatia requested review from ChrisHegarty, afoucret, carlosdelest, fang-xing-esql and tteofili December 16, 2024 16:34

ChrisHegarty approved these changes Dec 16, 2024

View reviewed changes

carlosdelest reviewed Dec 16, 2024

View reviewed changes

fang-xing-esql reviewed Dec 17, 2024

View reviewed changes

carlosdelest approved these changes Dec 17, 2024

View reviewed changes

ioanatia added 6 commits December 17, 2024 10:43

Add feature behind snapshot

3fed73d

Merge remote-tracking branch 'elasticsearch/main' into esql_query_rew…

de0d0c0

…rite

Use after/before instead of afterClass/beforeClass

a6ced43

Call onFailure instead of throwing exception

fe4256c

Merge remote-tracking branch 'elasticsearch/main' into esql_query_rew…

1f9123b

…rite

Fix KqlFunctionIT by requiring KqlPlugin

05b9063

ioanatia commented Dec 18, 2024

View reviewed changes

ioanatia added 3 commits December 18, 2024 10:08

Merge remote-tracking branch 'elasticsearch/main' into esql_query_rew…

070f2d6

…rite

Update scoring tests now that they are enabled

c5c0188

Drop the score column for now

1c2c2b5

ioanatia commented Dec 18, 2024

View reviewed changes

ChrisHegarty self-requested a review December 18, 2024 12:09

ChrisHegarty approved these changes Dec 18, 2024

View reviewed changes

ioanatia merged commit fb6d7db into elastic:main Dec 18, 2024

tteofili approved these changes Dec 18, 2024

View reviewed changes

ioanatia deleted the esql_query_rewrite branch December 18, 2024 12:11

This was referenced Dec 18, 2024

Prototype: Semantic search in ES|QL with query rewrite #118106

Closed

Support semantic_text in ES|QL #115103

Closed

This was referenced Dec 18, 2024

8.x - backport Semantic search with query builder rewrite (#118676) #118945

Merged

Add semantic_text to mapping_all_types and switch to TranslationAware in PushFiltersToSource #118982

Merged

ioanatia mentioned this pull request Jan 7, 2025

Enable query rewrites on the coordinator for ESQL #119667

Merged

jimczi mentioned this pull request Jan 8, 2025

Refactor QueryBuilderResolver Rewrite Logic #119740

Merged

jimczi added the v8.18.0 label Jan 8, 2025

ioanatia mentioned this pull request Aug 19, 2025

Refactor: Add RewriteableAware interface for functions that require r… #133075

Merged

	/**
	* Rewrites this query builder into its primitive form. By default this method return the builder itself. If the builder
	* did not change the identity reference must be returned otherwise the builder will be rewritten infinitely.
	*/
	@Override
	default QueryBuilder rewrite(QueryRewriteContext queryRewriteContext) throws IOException {
	return this;
	}

Conversation

ioanatia commented Dec 13, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Dec 16, 2024

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioanatia Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

ioanatia Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ioanatia Dec 17, 2024 •

edited

Loading

ioanatia Dec 18, 2024 •

edited

Loading