Compute engine evaluation for full text functions by ioanatia · Pull Request #113938 · elastic/elasticsearch

ioanatia · 2024-10-02T12:25:52Z

Not something we are actively working on, but I thought it would be nice to show how we can use the lucene expression evaluator that was added in #111157

The LuceneQueryExpressionEvaluator had to be modified - tests will need to be rewritten.
Right now full text functions translate to query builders that are then used to form the query that is pushed to Lucene.
The evaluator needed to transform the query builder to a query object and in order to do so in needs access to search execution context.
For that to happen we are now passing the ShardContexts from EsPhysicalOperationProviders to the evaluator.
That translated into further changes to EvaluatorMapper and ExpressionMapper.

Demo:

With this change we can lift some of the restrictions we have for full text functions when a where condition containing full text search functions cannot be pushed to Lucene.

Previously this did not work, but now it does return results instead of an error:

from search-movies
| where qstr("harry") OR length(plot) < 100

length(plot) < 100 is executed at the compute engine level and qstr previously did not have a compute engine implementation (it needed to be pushed to Lucene as part of the first stage retrieval)

Future work

Just ideas - not final proposals.
And except for scoring, nothing here is blocking.

Scoring implementation

We need to figure out how this would work for scoring.
Which is why I am not in a hurry to push this forward and would like to get scoring in first.

Example:

from search-movies METATDA _score
| where qstr("harry potter")                       <- gives an initial score and populates _score
| where qstr("harry") OR length(plot) < 100.       <- modifies _score by adding the score of `qstr("harry")

When we evaluate the qstr("harry") on the compute engine we also have to modify the value of _score.

Use full text functions not just in `WHERE`

We could also look at lifting the restriction of using full text functions just in WHERE.
For example we can allow them in EVAL:

from search-moves METADATA _score
| eval result = qstr("harry")

In this case:

when using full text functions with EVAL we need to decide whether this should also modify the _score - I am leaning towards not modifying 🤔
we need to figure out what would be the result type - my guess is boolean, because for WHERE full text functions return boolean values (even if the modify the _score). Makes more sense for it to be boolean because we can also have EVAL result = qstr("harry") OR length(plot) < 100 in which case it is clear that we are dealing with a boolean value.
what if we just want the score? could we have something like:

from search-movies
| eval result = score(qstr("harry"))

Allow full text functions to work on fields that are not mapped

This would require us to build some kind of Lucene index in memory.
Example:

from search-movies
| EVAL my_new_field = // // something that is computed on every doc and depends on the doc fields
| WHERE match(my_new_field, "hello")

This goes together with the previous point of allowing full text functions not just in EVAL.
If they would work on non mapped fields we could also used them after commands like STATS etc.

Allow the query string to be evaluated on every doc

An interesting side-effect of having a compute engine implementation is that we can evaluate the values that are being passed to full text functions on every doc.

So right now qstr and the match function require that the query argument is always a constant string, because we are pushing them as Lucene queries. Each doc is evaluated against the same query string.
We could lift this restriction and have:

from search-movies
| where match(plot, title)

this translates to "give me all the docs where the plot matches the title" - plot and title are both mapped fields.
more generally:

from search-movies
| eval my_query_string = // something that is computed on every doc and depends on the doc fields
| where match(my_field, my_query_string)

ioanatia · 2024-10-02T13:31:25Z

...ava/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunctionEvaluator.java

 * this evaluator is here to save the day.
 */
-public class LuceneQueryExpressionEvaluator implements EvalOperator.ExpressionEvaluator {
+public class FullTextFunctionEvaluator implements EvalOperator.ExpressionEvaluator {


I had some import problems while I was working at this and had to move it outside of compute - I think I can move it back now and keep the initial name 🤔

nik9000 · 2024-10-02T14:07:17Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java

+        var shardContexts = (physicalOperationProviders instanceof EsPhysicalOperationProviders)
+            ? ((EsPhysicalOperationProviders) physicalOperationProviders).shardContexts()
+            : null;
+        return EvalMapper.toEvaluator(exp, layout, shardContexts);


I suppose I'd delegate this to EsPhysicalOperationProviders instead of instanceof it, but this is a proof of concept so it gets the job done.

nik9000 · 2024-10-02T14:08:41Z

...main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryStringFunction.java

+    public EvalOperator.ExpressionEvaluator.Factory toEvaluator(
+        Function<Expression, EvalOperator.ExpressionEvaluator.Factory> toEvaluator,
+        List<EsPhysicalOperationProviders.ShardContext> shardContexts
+    ) {


I think we're better off changing the Function into an interface with both of these in it. But, again, proof of concept.

ioanatia · 2025-01-28T08:57:25Z

closing in favour of #120291

Full text function lucene evaluator

919fd69

ioanatia added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Oct 2, 2024

elasticsearchmachine added the v9.0.0 label Oct 2, 2024

ioanatia commented Oct 2, 2024

View reviewed changes

nik9000 reviewed Oct 2, 2024

View reviewed changes

ChrisHegarty mentioned this pull request Oct 23, 2024

Examine how to lift restrictions on ES|QL full text functions (match and qstr) #115364

Closed

carlosdelest mentioned this pull request Jan 16, 2025

ESQL - Allow full text functions disjunctions for non-full text functions #120291

Merged

1 task

ioanatia closed this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute engine evaluation for full text functions#113938

Compute engine evaluation for full text functions#113938
ioanatia wants to merge 1 commit intoelastic:mainfrom
ioanatia:full_text_functions_compute

ioanatia commented Oct 2, 2024 •

edited

Loading

Uh oh!

ioanatia Oct 2, 2024

Uh oh!

nik9000 Oct 2, 2024

Uh oh!

nik9000 Oct 2, 2024

Uh oh!

ioanatia commented Jan 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ioanatia commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Future work

Scoring implementation

Use full text functions not just in WHERE

Allow full text functions to work on fields that are not mapped

Allow the query string to be evaluated on every doc

Uh oh!

ioanatia Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

nik9000 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

nik9000 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

ioanatia commented Jan 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ioanatia commented Oct 2, 2024 •

edited

Loading

Use full text functions not just in `WHERE`