Skip to content

ESQL: Add ExpressionEvaluator for Lucene Query#111157

Merged
elasticsearchmachine merged 9 commits intoelastic:mainfrom
nik9000:esql_lucene_runtime
Jul 25, 2024
Merged

ESQL: Add ExpressionEvaluator for Lucene Query#111157
elasticsearchmachine merged 9 commits intoelastic:mainfrom
nik9000:esql_lucene_runtime

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Jul 22, 2024

I was talking with @ioanatia on Friday about building an ExpressionEvaluator that could run a Lucene Query during the compute engine's normal runtime. It sounded fun so I took a crack at it. It's not finished or plugged in, but I think something like this would be useful to build on.

The idea here is that, for stuff like "this text field matches this string" AKA WHERE title MATCH "harry potter", we push it to Lucene where possible, but we don't have to. With this handy tool! That lines up better with the way ESQL works in general. It makes planning simpler if you can fall back on "doing it at runtime".

Now, running a lucene query at runtime isn't ideal. In the worst case we're running a MatchAll query to iterate everything and then running this query, block by block.

I was talking with @ioanatia on Friday about building an
`ExpressionEvaluator` that could run a Lucene `Query` during the
compute engine's normal runtime. It sounded fun so I took a crack at it.
It's not finished or plugged in, but I think something like this would
be useful to build on.

The idea here is that, for stuff like "this text field matches this
string" AKA `WHERE title MATCH "harry potter"`, we push it to Lucene
where possible, but we don't *have* to. With this handy tool! That lines
up better with the way ESQL works in general. It makes planning simpler
if you can fall back on "doing it at runtime".

Now, running a lucene query at runtime isn't ideal. In the worst case
we're running a `MatchAll` query to iterate everything and then running
this query, block by block.
@nik9000 nik9000 requested a review from ioanatia July 22, 2024 13:31
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 22, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

* {@link LuceneSourceOperator} or the like, but sometimes this isn't possible. So
* this evaluator is here to save the day.
*/
public class LuceneQueryExpressionEvaluator implements EvalOperator.ExpressionEvaluator {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having this just check for matching rather than scoring. I think scoring is something we'd want to think about later. Probably similar to this code though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match or No Match is incredibly useful, as is. Scoring can come later and separately.

Copy link
Copy Markdown
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/**
* Collects matching information for dense range of doc ids. This assumes that
* doc ids are sent to {@link LeafCollector#collect(int)} in ascending order
* which isn't documented, but @jpountz swears is true.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... then it MUST be true :-)


import static org.apache.lucene.tests.util.LuceneTestCase.random;

public class ShuffleDocsOperator extends AbstractPageMappingOperator {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* {@link LuceneSourceOperator} or the like, but sometimes this isn't possible. So
* this evaluator is here to save the day.
*/
public class LuceneQueryExpressionEvaluator implements EvalOperator.ExpressionEvaluator {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match or No Match is incredibly useful, as is. Scoring can come later and separately.

Copy link
Copy Markdown
Member

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work - you took a half baked idea and made it into something concrete.
I'd like to get this in because my goal is to try and use this with the match operator once we have the match operator PR merged.
Nothing at the moment is using LuceneQueryExpressionEvaluator - I see no harm in merging this PR, especially since we are planning to use it later on.

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jul 25, 2024

I'd like to get this in because my goal is to try and use this with the match operator once we have the match operator PR merged.

OK! I'll get CI happy and get this in today.

@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 25, 2024
@elasticsearchmachine elasticsearchmachine merged commit 4c6f37d into elastic:main Jul 25, 2024
@nik9000 nik9000 deleted the esql_lucene_runtime branch July 25, 2024 13:47
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jul 25, 2024

OK! I'll get CI happy and get this in today.

Enjoy!

I don't envy the planning work on this one. I don't know precisely how to make it go, but it's got something to do with making sure we run this before the exchange. And before dropping _doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants