Skip to content

Compute engine evaluation for full text functions#113938

Closed
ioanatia wants to merge 1 commit intoelastic:mainfrom
ioanatia:full_text_functions_compute
Closed

Compute engine evaluation for full text functions#113938
ioanatia wants to merge 1 commit intoelastic:mainfrom
ioanatia:full_text_functions_compute

Conversation

@ioanatia
Copy link
Copy Markdown
Member

@ioanatia ioanatia commented Oct 2, 2024

Not something we are actively working on, but I thought it would be nice to show how we can use the lucene expression evaluator that was added in #111157

The LuceneQueryExpressionEvaluator had to be modified - tests will need to be rewritten.
Right now full text functions translate to query builders that are then used to form the query that is pushed to Lucene.
The evaluator needed to transform the query builder to a query object and in order to do so in needs access to search execution context.
For that to happen we are now passing the ShardContexts from EsPhysicalOperationProviders to the evaluator.
That translated into further changes to EvaluatorMapper and ExpressionMapper.

Demo:

With this change we can lift some of the restrictions we have for full text functions when a where condition containing full text search functions cannot be pushed to Lucene.

Previously this did not work, but now it does return results instead of an error:

from search-movies
| where qstr("harry") OR length(plot) < 100

length(plot) < 100 is executed at the compute engine level and qstr previously did not have a compute engine implementation (it needed to be pushed to Lucene as part of the first stage retrieval)

Future work

Just ideas - not final proposals.
And except for scoring, nothing here is blocking.

Scoring implementation

We need to figure out how this would work for scoring.
Which is why I am not in a hurry to push this forward and would like to get scoring in first.

Example:

from search-movies METATDA _score
| where qstr("harry potter")                       <- gives an initial score and populates _score
| where qstr("harry") OR length(plot) < 100.       <- modifies _score by adding the score of `qstr("harry")

When we evaluate the qstr("harry") on the compute engine we also have to modify the value of _score.

Use full text functions not just in WHERE

We could also look at lifting the restriction of using full text functions just in WHERE.
For example we can allow them in EVAL:

from search-moves METADATA _score
| eval result = qstr("harry")

In this case:

  • when using full text functions with EVAL we need to decide whether this should also modify the _score - I am leaning towards not modifying 🤔
  • we need to figure out what would be the result type - my guess is boolean, because for WHERE full text functions return boolean values (even if the modify the _score). Makes more sense for it to be boolean because we can also have EVAL result = qstr("harry") OR length(plot) < 100 in which case it is clear that we are dealing with a boolean value.
  • what if we just want the score? could we have something like:
from search-movies
| eval result = score(qstr("harry"))

Allow full text functions to work on fields that are not mapped

This would require us to build some kind of Lucene index in memory.
Example:

from search-movies
| EVAL my_new_field = // // something that is computed on every doc and depends on the doc fields
| WHERE match(my_new_field, "hello")

This goes together with the previous point of allowing full text functions not just in EVAL.
If they would work on non mapped fields we could also used them after commands like STATS etc.

Allow the query string to be evaluated on every doc

An interesting side-effect of having a compute engine implementation is that we can evaluate the values that are being passed to full text functions on every doc.

So right now qstr and the match function require that the query argument is always a constant string, because we are pushing them as Lucene queries. Each doc is evaluated against the same query string.
We could lift this restriction and have:

from search-movies
| where match(plot, title)

this translates to "give me all the docs where the plot matches the title" - plot and title are both mapped fields.
more generally:

from search-movies
| eval my_query_string = // something that is computed on every doc and depends on the doc fields
| where match(my_field, my_query_string)

@ioanatia ioanatia added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Oct 2, 2024
* this evaluator is here to save the day.
*/
public class LuceneQueryExpressionEvaluator implements EvalOperator.ExpressionEvaluator {
public class FullTextFunctionEvaluator implements EvalOperator.ExpressionEvaluator {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some import problems while I was working at this and had to move it outside of compute - I think I can move it back now and keep the initial name 🤔

var shardContexts = (physicalOperationProviders instanceof EsPhysicalOperationProviders)
? ((EsPhysicalOperationProviders) physicalOperationProviders).shardContexts()
: null;
return EvalMapper.toEvaluator(exp, layout, shardContexts);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I'd delegate this to EsPhysicalOperationProviders instead of instanceof it, but this is a proof of concept so it gets the job done.

public EvalOperator.ExpressionEvaluator.Factory toEvaluator(
Function<Expression, EvalOperator.ExpressionEvaluator.Factory> toEvaluator,
List<EsPhysicalOperationProviders.ShardContext> shardContexts
) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're better off changing the Function into an interface with both of these in it. But, again, proof of concept.

@ioanatia
Copy link
Copy Markdown
Member Author

closing in favour of #120291

@ioanatia ioanatia closed this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants