ESQL: Add ExpressionEvaluator for Lucene Query#111157
ESQL: Add ExpressionEvaluator for Lucene Query#111157elasticsearchmachine merged 9 commits intoelastic:mainfrom
ExpressionEvaluator for Lucene Query#111157Conversation
I was talking with @ioanatia on Friday about building an `ExpressionEvaluator` that could run a Lucene `Query` during the compute engine's normal runtime. It sounded fun so I took a crack at it. It's not finished or plugged in, but I think something like this would be useful to build on. The idea here is that, for stuff like "this text field matches this string" AKA `WHERE title MATCH "harry potter"`, we push it to Lucene where possible, but we don't *have* to. With this handy tool! That lines up better with the way ESQL works in general. It makes planning simpler if you can fall back on "doing it at runtime". Now, running a lucene query at runtime isn't ideal. In the worst case we're running a `MatchAll` query to iterate everything and then running this query, block by block.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| * {@link LuceneSourceOperator} or the like, but sometimes this isn't possible. So | ||
| * this evaluator is here to save the day. | ||
| */ | ||
| public class LuceneQueryExpressionEvaluator implements EvalOperator.ExpressionEvaluator { |
There was a problem hiding this comment.
I'm having this just check for matching rather than scoring. I think scoring is something we'd want to think about later. Probably similar to this code though.
There was a problem hiding this comment.
Match or No Match is incredibly useful, as is. Scoring can come later and separately.
| /** | ||
| * Collects matching information for dense range of doc ids. This assumes that | ||
| * doc ids are sent to {@link LeafCollector#collect(int)} in ascending order | ||
| * which isn't documented, but @jpountz swears is true. |
There was a problem hiding this comment.
... then it MUST be true :-)
|
|
||
| import static org.apache.lucene.tests.util.LuceneTestCase.random; | ||
|
|
||
| public class ShuffleDocsOperator extends AbstractPageMappingOperator { |
| * {@link LuceneSourceOperator} or the like, but sometimes this isn't possible. So | ||
| * this evaluator is here to save the day. | ||
| */ | ||
| public class LuceneQueryExpressionEvaluator implements EvalOperator.ExpressionEvaluator { |
There was a problem hiding this comment.
Match or No Match is incredibly useful, as is. Scoring can come later and separately.
There was a problem hiding this comment.
Great work - you took a half baked idea and made it into something concrete.
I'd like to get this in because my goal is to try and use this with the match operator once we have the match operator PR merged.
Nothing at the moment is using LuceneQueryExpressionEvaluator - I see no harm in merging this PR, especially since we are planning to use it later on.
OK! I'll get CI happy and get this in today. |
Enjoy! I don't envy the planning work on this one. I don't know precisely how to make it go, but it's got something to do with making sure we run this before the exchange. And before dropping _doc. |
I was talking with @ioanatia on Friday about building an
ExpressionEvaluatorthat could run a LuceneQueryduring the compute engine's normal runtime. It sounded fun so I took a crack at it. It's not finished or plugged in, but I think something like this would be useful to build on.The idea here is that, for stuff like "this text field matches this string" AKA
WHERE title MATCH "harry potter", we push it to Lucene where possible, but we don't have to. With this handy tool! That lines up better with the way ESQL works in general. It makes planning simpler if you can fall back on "doing it at runtime".Now, running a lucene query at runtime isn't ideal. In the worst case we're running a
MatchAllquery to iterate everything and then running this query, block by block.