Compute engine evaluation for full text functions#113938
Closed
ioanatia wants to merge 1 commit intoelastic:mainfrom
Closed
Compute engine evaluation for full text functions#113938ioanatia wants to merge 1 commit intoelastic:mainfrom
ioanatia wants to merge 1 commit intoelastic:mainfrom
Conversation
ioanatia
commented
Oct 2, 2024
| * this evaluator is here to save the day. | ||
| */ | ||
| public class LuceneQueryExpressionEvaluator implements EvalOperator.ExpressionEvaluator { | ||
| public class FullTextFunctionEvaluator implements EvalOperator.ExpressionEvaluator { |
Member
Author
There was a problem hiding this comment.
I had some import problems while I was working at this and had to move it outside of compute - I think I can move it back now and keep the initial name 🤔
nik9000
reviewed
Oct 2, 2024
| var shardContexts = (physicalOperationProviders instanceof EsPhysicalOperationProviders) | ||
| ? ((EsPhysicalOperationProviders) physicalOperationProviders).shardContexts() | ||
| : null; | ||
| return EvalMapper.toEvaluator(exp, layout, shardContexts); |
Member
There was a problem hiding this comment.
I suppose I'd delegate this to EsPhysicalOperationProviders instead of instanceof it, but this is a proof of concept so it gets the job done.
| public EvalOperator.ExpressionEvaluator.Factory toEvaluator( | ||
| Function<Expression, EvalOperator.ExpressionEvaluator.Factory> toEvaluator, | ||
| List<EsPhysicalOperationProviders.ShardContext> shardContexts | ||
| ) { |
Member
There was a problem hiding this comment.
I think we're better off changing the Function into an interface with both of these in it. But, again, proof of concept.
1 task
Member
Author
|
closing in favour of #120291 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Not something we are actively working on, but I thought it would be nice to show how we can use the lucene expression evaluator that was added in #111157
The
LuceneQueryExpressionEvaluatorhad to be modified - tests will need to be rewritten.Right now full text functions translate to query builders that are then used to form the query that is pushed to Lucene.
The evaluator needed to transform the query builder to a query object and in order to do so in needs access to search execution context.
For that to happen we are now passing the
ShardContexts fromEsPhysicalOperationProvidersto the evaluator.That translated into further changes to
EvaluatorMapperandExpressionMapper.Demo:
With this change we can lift some of the restrictions we have for full text functions when a
wherecondition containing full text search functions cannot be pushed to Lucene.Previously this did not work, but now it does return results instead of an error:
length(plot) < 100is executed at the compute engine level andqstrpreviously did not have a compute engine implementation (it needed to be pushed to Lucene as part of the first stage retrieval)Future work
Just ideas - not final proposals.
And except for scoring, nothing here is blocking.
Scoring implementation
We need to figure out how this would work for scoring.
Which is why I am not in a hurry to push this forward and would like to get scoring in first.
Example:
When we evaluate the
qstr("harry")on the compute engine we also have to modify the value of_score.Use full text functions not just in
WHEREWe could also look at lifting the restriction of using full text functions just in
WHERE.For example we can allow them in
EVAL:In this case:
EVALwe need to decide whether this should also modify the_score- I am leaning towards not modifying 🤔WHEREfull text functions return boolean values (even if the modify the _score). Makes more sense for it to be boolean because we can also haveEVAL result = qstr("harry") OR length(plot) < 100in which case it is clear that we are dealing with a boolean value.Allow full text functions to work on fields that are not mapped
This would require us to build some kind of Lucene index in memory.
Example:
This goes together with the previous point of allowing full text functions not just in EVAL.
If they would work on non mapped fields we could also used them after commands like
STATSetc.Allow the query string to be evaluated on every doc
An interesting side-effect of having a compute engine implementation is that we can evaluate the values that are being passed to full text functions on every doc.
So right now
qstrand thematchfunction require that the query argument is always a constant string, because we are pushing them as Lucene queries. Each doc is evaluated against the same query string.We could lift this restriction and have:
this translates to "give me all the docs where the plot matches the title" - plot and title are both mapped fields.
more generally: