Add ES|QL match operator (:) #114831
Conversation
There was a problem hiding this comment.
Thank you @carlosdelest! The grammar looks a lot cleaner, and I just have a couple of questions, otherwise it looks good to me. Don't forget to run test-release.
| * Full text function that performs a {@link QueryStringQuery} . | ||
| */ | ||
| public class Match extends FullTextFunction implements Validatable { | ||
| public class Match extends FullTextFunction implements Validatable, TwoOptionalArguments { |
There was a problem hiding this comment.
It seems like we don't need to implement TwoOptionalArguments for now.
| Expression boost = null; | ||
| Expression fuzziness = null; | ||
| if (in.getTransportVersion().onOrAfter(TransportVersions.MATCH_OPERATOR_COLON)) { | ||
| isOperator = in.readBoolean(); |
There was a problem hiding this comment.
Just a question to confirm that I understand it right - a TransportVersion is need here because of isOperator will be needed by the data nodes, is it right? Do the data node need to be aware whether it is an operator or function? If it is only needed by the coordinator node, perhaps a TransportVersion is not needed.
There was a problem hiding this comment.
I added a TransportVersion as we're changing the serialization / deserialization, adding a new class field.
As you're mentioning, the attribute is probably just needed on the coordinator node and thus there's no point in serializing it. But I think we should be consistent with the serialization semantics here unless we have a good reason not to. So I have a couple of questions:
- Is sending the plan to the data nodes the only way of triggering serialization and deserialization via transport in ES|QL?
- Are there other instances in ES|QL where we're avoiding serialization of attributes based on the knowledge that data nodes are not using that information?
For me, it would be simpler to add the serialization of the attribute just in case (we may add for example a check on the data nodes that needs to error out with this information), but I'm happy to follow standards.
Let me know and I can remove it, adding a transient keyword and a comment so people are aware of this attribute not being serialized.
There was a problem hiding this comment.
These are valid questions.
- Is sending the plan to the data nodes the only way of triggering serialization and deserialization via transport in ES|QL?
The serialization and deserialization are triggered by tests as well, and it is also
triggered by cross cluster search.
- Are there other instances in ES|QL where we're avoiding serialization of attributes based on the knowledge that data nodes are not using that information?
If the reading and working code coordinate correctly it will work, however it can get surprise. I'd suggest to avoid serializing unnecessary things when possible, as we came across issues(OOM) with serializing large plans before, isOperator is a tiny thing, however many tiny things add together could be big, and considering we are likely to add an optional parameter to the match function to support match options, having a TransportVersion for isOperator seems a bit heavy.
I wonder if it is a good idea to remove the isOperator from constructor, and differentiate match function from : operator from its source, so that Verifier can generate the correct message?
There was a problem hiding this comment.
I wonder if it is a good idea to remove the isOperator from constructor, and differentiate match function from : operator from its source, so that Verifier can generate the correct message?
I see, this is something I hadn't thought of. It makes sense as this can be derived from source.
I've given it a shot in a0bc3c6, WDYT?
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
…e it from source instead to avoid serialization
|
|
||
| private boolean isOperator() { | ||
| if (isOperator == null) { | ||
| isOperator = source().text().toUpperCase(Locale.ROOT).startsWith(super.functionName()) == false; |
There was a problem hiding this comment.
Shall we check the source text starts with match( and end with ), in case a field name starts with match?
There was a problem hiding this comment.
yes, that's a possibility. I've used a regex to address whitespaces between match and ( and added some tests to check messages are returned correctly in 9ef72e7
|
@elasticmachine run elasticsearch-ci/release-tests |
fang-xing-esql
left a comment
There was a problem hiding this comment.
LGTM, thank you Carlos!
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
(cherry picked from commit f88f68d)
Adds a
:match operator that provides support for the following expressions:This operator replaces the previous MATCH operator.
Fuzziness and boosting will be added in a follow-up PR.