Conversation
| @Override | ||
| protected PhysicalPlan rule(QueryStringFilterExec queryStringFilterExec, LocalPhysicalOptimizerContext ctx) { | ||
| PhysicalPlan plan = queryStringFilterExec; | ||
| if (queryStringFilterExec.child() instanceof EsQueryExec queryExec) { |
There was a problem hiding this comment.
Pushes down query string to ES query filters
| ); | ||
| } | ||
| } else if (p instanceof QueryStringFilterExec queryStringFilterExec) { | ||
| failures.add( |
There was a problem hiding this comment.
For now, there's no usage of MATCH directly on non-index based sources - we can implement that separately
| private PhysicalOperation planQueryStringFilter(QueryStringFilterExec queryStringFilter, LocalExecutionPlannerContext context) { | ||
| PhysicalOperation source = plan(queryStringFilter.child(), context); | ||
| // TODO: Do we need a QueryStringFilterOperator here? Can we apply query string filter directly on the Page? | ||
| return source; |
There was a problem hiding this comment.
For now we don't apply query string filters on Page directly
…query string applied
…lest/esql-match-command
| } else if (exp instanceof SpatialRelatesFunction bc) { | ||
| return bc.canPushToSource(LocalPhysicalPlanOptimizer::isAggregatable); | ||
| } else if (exp instanceof StringQueryPredicate) { | ||
| return true; |
There was a problem hiding this comment.
We need to parse the query string in order to retrieve the fields - can be done as a follow up once we agree on the approach
There was a problem hiding this comment.
Right. At the moment it's just an opaque string.
There was a problem hiding this comment.
You'd have to do that much earlier than the Local PhysicalPlanOptimizer I think.
There was a problem hiding this comment.
In ES|QL "retrieval" of fields and how they are resolved (what type they have, if the name is correct or not, if there are conflicts between data types or not) is done in a different layer.
There was a problem hiding this comment.
Given that we're still not retrieving the fields that are involved in the query string, is there something we need to do at this stage besides always trying to push it to source?
My thinking was to check that afterwards. using maybe the LogicalVerifier to double check that there were no Filter plans with unsupported Expression conditions.
Would that approach work, or should any other approach work better for an (as of now) opaque query string?
There was a problem hiding this comment.
LogicalVerifier has a different role: it checks that every logicalPlan has the correct set of inputs and outputs (that it doesn't create un-tracked columns or loses references along the way).
You need a check similar to what Ioana did with the match operator in the Verifier.
There was a problem hiding this comment.
Thanks for clarifying @astefan !
I don't think we can provide something similar until the fields from the query string are parsed in follow up iterations.
Do y'all think there's something specific we can do before that happens to prevent an invalid use of the command?
There was a problem hiding this comment.
Think about queries where match comes after some commands that you know for sure it cannot be ok. For example, after stats for sure it's not ok because stats aggregates the results; doing a match afterwards is pointless.
There was a problem hiding this comment.
Taking a closer look, I think there's only a couple of commands where it would be safe to use MATCH after (without field information): WHERE and SORT: ed1852e and 42751b7
We're very conservative there, but it's probably better that way until we can decide on optimizations and validations.
I've added some tests as well, they're incomplete - pending validation of this approach.
WDYAT?
ChrisHegarty
left a comment
There was a problem hiding this comment.
I agree with the direction here - to start off as a filter and use StringQueryPredicate to support Elasticsearch Query String syntax only. We can then built up from there.
| } else if (exp instanceof SpatialRelatesFunction bc) { | ||
| return bc.canPushToSource(LocalPhysicalPlanOptimizer::isAggregatable); | ||
| } else if (exp instanceof StringQueryPredicate) { | ||
| return true; |
There was a problem hiding this comment.
Right. At the moment it's just an opaque string.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Sure. This comment needs to be addressed #111147 (comment). |
|
@elasticsearchmachine update branch |
astefan
left a comment
There was a problem hiding this comment.
I've tested this a bit. While the code here in this PR is ok and can be regarded as an acceptable version of a match command (while very primitive) there are some issues that sooner or later must be addressed and as I mentioned the work of these followups is an order of magnitude greater that the simple code here:
- a
matchcommand and amatchoperator is confusing - there are no specific optimizations
- here first they need to be identified and this is hard
- secondly, it can extend the list of supported queries which currently are not that many
- the
matchcommand accepting a string that is opaque is a problem- the field names inside are not under the control of the parser and any layer that deals with fields in ES|QL; this has a lot of implications
- one example of field names transparency issue has already been mentioned by @ioanatia
- another one, which is not obvious:
test1 index
"message": {
"type": "text"
}
test2 index
"message": {
"type": "integer"
}
and use this data
{"index": {"_index":"test1","_id":1}}
{"message": "FOO bar 123"}
{"index": {"_index":"test2","_id":8}}
{"whatever":100}
{"index": {"_index":"test2","_id":9}}
{"message": 123}
with query from test* metadata _index | match "123"
you get this information
message | whatever | _index
---------------+---------------+---------------
null |null |test2
null |null |test1
Meaning, something matched but users will not see why things matched.
Agreed that this is meant as a first step in that direction 👍
Totally agreed - we can add them as follow up and prioritise them
Good feedback. Names can still change as we're still in an early phase and this is not exposed to users.
There are no optimizations yet - I guess first we need to get fields information before applying them.
Can you please clarify what do you mean by this?
I agree. Bot issues are due to not parsing field information from the query, is that correct? We can start working on properly parsing the query string (and extract field information) if y'all think that's the next step to do. Some follow up work that we can discuss and prioritize:
Is there anything that should be addressed before merging this PR, given the intention of addressing the above? |
…ch-command # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
ioanatia
left a comment
There was a problem hiding this comment.
Good work!
The Verifier validations you have added have should address the comments from my initial PR review.
Queries like this one should not be supported:
FROM search-movies
| KEEP plot
| MATCH "title:harry*"
It also addresses this valid comment from Andrei about a fundamental aspect of ES|QL:
This breaks the definition of a command in ES|QL: every command acts on the outputs it receives as inputs from the previous command(s). ES|QL doesn't make such an exception anywhere else in its commands. Using #111147 (comment) is error prone, it's making the command confusing and will have an unexpected behavior.
There is one point from Andrei regarding matching on fields with conflicting types that's not been addressed. We should follow up on that.
There are a lot of comments on the PR so at a first glance it might seem that they are left unaddressed, but AFAICS they have been addressed.
I think we are at a state where we can merge this - the support for query string is just the initial step. This is also only available for snapshot builds - but having it merged makes it easier to test and get feedback.
|
Thanks @ioanatia !
I don't think there's a way of addressing it without field information. On #111487 we will retrieve fields information so we can unblock work on this. |
++ LGTM |
fang-xing-esql
left a comment
There was a problem hiding this comment.
It is a good start to open up more possibilities in ES|QL to support full text search. Thank you for all the effort @carlosdelest!
|
@nik9000 , you requested some changes but I think that was due to your unintentional approval. Can you please review so I can merge this? Thanks! 🙏 |
bpintea
left a comment
There was a problem hiding this comment.
Meaning, something matched but users will not see why things matched.
Noting another facet of the above and the (current) lack of query string analysis: a query like from hosts | match "_index : hosts" will match on a field not exposed / enabled in the source command.
secondly, it can extend the list of supported queries which currently are not that many
Can you please clarify what do you mean by this?
I supposed this has to do with the restrictions enforced in the Verifier, many (most?) of which could be lifted once the query string is parsed.
Has this been addressed? I think it would be good to have tests (integration and spec) that chain multiple MATCHes, possibly interweaved with other allowed commands. Possibly for a next iteration.
| // StringQueryPredicate | ||
| plan.forEachDown(LogicalPlan.class, lp -> { | ||
| if ((lp instanceof Filter || lp instanceof OrderBy || lp instanceof EsRelation) == false) { | ||
| failures.add(fail(plan, "MATCH cannot be used after {}", lp.sourceText().split(" ")[0].toUpperCase(Locale.ROOT))); |
There was a problem hiding this comment.
In case this check remains in place later as well: the user won't know what to do next, we usually try to provide guiding error messages (even if they're long[er]). In this case, we can add that this command can only be used past those other commands (WHERE, SORT or FROM).
Can be picked by next iteration, I understand that this is a small work fragment.
There was a problem hiding this comment.
Correct! This is a guardrail for now. We aim to provide better interoperability / error messages, plus docs guidance about limitations if needed.
| LIKE: 'like'; | ||
| LP : '('; | ||
| MATCH: 'match'; | ||
| MATCH_OPERATOR: 'match'; |
There was a problem hiding this comment.
Another very minor note for next work: from hosts | matc will error with: "reason": "line 7:15: mismatched input 'matc' expecting {'dissect', 'drop', 'enrich', 'eval', 'grok', 'inlinestats', 'keep', 'limit', 'lookup', 'mv_expand', 'rename', 'sort', 'stats', 'where', MATCH}",
Edit: location is skewed / to be ignored, focus is the way the MATCH command is listed / standing out.
| * LimitExec[1000[INTEGER]] | ||
| * \_ExchangeExec[[],false] | ||
| * \_ProjectExec[[_meta_field{f}#8, emp_no{f}#2, first_name{f}#3, gender{f}#4, job{f}#9, job.raw{f}#10, languages{f}#5, last_na | ||
| * me{f}#6, long_noidx{f}#11, salary{f}#7]] | ||
| * \_FieldExtractExec[_meta_field{f}#8, emp_no{f}#2, first_name{f}#3] | ||
| * \_EsQueryExec[test], indexMode[standard], query[{"query_string":{"query":"\"last_name: Smith\"" |
There was a problem hiding this comment.
😊 Yes! Thanks for the catch!
Yes - until we don't have query analysis we can't have that.
I did add a test in f1b8dc0. I didn't add a whole lot more as this is a restriction that I expect to have lifted soon - we can add more test cases once we approach the final version of the command. LMK if you think there's more test cases that should be added for this first iteration! |
What mean is smth that Ioana mentioned above as well: multiple MATCHes, not just multiple filters. But unless there will be any other changes to go in before merging, it doesn't have to be now. Note: the above won't work now as is, but once the limit is removed, it should. |
I see! Added a test in 52e8619 |
|
I'm merging this as it has approvals already. We can follow up on #111487 |
|
Hi @carlosdelest Just playing with this.... Your test snippet dataset is good but gives the wrong "impression" that this is a "starts with" text search. You should add a couple of messages with an example that shows the search matches mid string like |
Adds an initial
MATCHcommand to ES|QL, that uses a query string query:For now, the query string is opaque - we don't analyze it. A Filter is created from it using a
StringQueryPredicate.The command is only included in snapshot builds to allow follow up work.
Test snippet