Skip to content

Add ES|QL match operator (:) #114831

Merged
carlosdelest merged 148 commits intoelastic:mainfrom
carlosdelest:carlosdelest/esql-match-operator-colon
Nov 6, 2024
Merged

Add ES|QL match operator (:) #114831
carlosdelest merged 148 commits intoelastic:mainfrom
carlosdelest:carlosdelest/esql-match-operator-colon

Conversation

@carlosdelest
Copy link
Copy Markdown
Member

@carlosdelest carlosdelest commented Oct 15, 2024

Adds a : match operator that provides support for the following expressions:

  FROM search-movies
  | WHERE title:"Harry" and title:"Potter"

This operator replaces the previous MATCH operator.

Fuzziness and boosting will be added in a follow-up PR.

carlosdelest and others added 30 commits September 20, 2024 14:25
# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
# Conflicts:
#	x-pack/plugin/build.gradle
Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @carlosdelest! The grammar looks a lot cleaner, and I just have a couple of questions, otherwise it looks good to me. Don't forget to run test-release.

* Full text function that performs a {@link QueryStringQuery} .
*/
public class Match extends FullTextFunction implements Validatable {
public class Match extends FullTextFunction implements Validatable, TwoOptionalArguments {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we don't need to implement TwoOptionalArguments for now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch! Fixed in 597951b

Expression boost = null;
Expression fuzziness = null;
if (in.getTransportVersion().onOrAfter(TransportVersions.MATCH_OPERATOR_COLON)) {
isOperator = in.readBoolean();
Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question to confirm that I understand it right - a TransportVersion is need here because of isOperator will be needed by the data nodes, is it right? Do the data node need to be aware whether it is an operator or function? If it is only needed by the coordinator node, perhaps a TransportVersion is not needed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a TransportVersion as we're changing the serialization / deserialization, adding a new class field.

As you're mentioning, the attribute is probably just needed on the coordinator node and thus there's no point in serializing it. But I think we should be consistent with the serialization semantics here unless we have a good reason not to. So I have a couple of questions:

  • Is sending the plan to the data nodes the only way of triggering serialization and deserialization via transport in ES|QL?
  • Are there other instances in ES|QL where we're avoiding serialization of attributes based on the knowledge that data nodes are not using that information?

For me, it would be simpler to add the serialization of the attribute just in case (we may add for example a check on the data nodes that needs to error out with this information), but I'm happy to follow standards.

Let me know and I can remove it, adding a transient keyword and a comment so people are aware of this attribute not being serialized.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are valid questions.

  • Is sending the plan to the data nodes the only way of triggering serialization and deserialization via transport in ES|QL?

The serialization and deserialization are triggered by tests as well, and it is also
triggered by cross cluster search.

  • Are there other instances in ES|QL where we're avoiding serialization of attributes based on the knowledge that data nodes are not using that information?

If the reading and working code coordinate correctly it will work, however it can get surprise. I'd suggest to avoid serializing unnecessary things when possible, as we came across issues(OOM) with serializing large plans before, isOperator is a tiny thing, however many tiny things add together could be big, and considering we are likely to add an optional parameter to the match function to support match options, having a TransportVersion for isOperator seems a bit heavy.
I wonder if it is a good idea to remove the isOperator from constructor, and differentiate match function from : operator from its source, so that Verifier can generate the correct message?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is a good idea to remove the isOperator from constructor, and differentiate match function from : operator from its source, so that Verifier can generate the correct message?

I see, this is something I hadn't thought of. It makes sense as this can be derived from source.

I've given it a shot in a0bc3c6, WDYT?

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
@carlosdelest carlosdelest added the test-release Trigger CI checks against release build label Nov 4, 2024

private boolean isOperator() {
if (isOperator == null) {
isOperator = source().text().toUpperCase(Locale.ROOT).startsWith(super.functionName()) == false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we check the source text starts with match( and end with ), in case a field name starts with match?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a possibility. I've used a regex to address whitespaces between match and ( and added some tests to check messages are returned correctly in 9ef72e7

@carlosdelest
Copy link
Copy Markdown
Member Author

@elasticmachine run elasticsearch-ci/release-tests

Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you Carlos!

@carlosdelest
Copy link
Copy Markdown
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Nov 6, 2024
carlosdelest added a commit that referenced this pull request Nov 6, 2024
jozala pushed a commit that referenced this pull request Nov 13, 2024
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged ES|QL-ui Impacts ES|QL UI >non-issue :Search Relevance/Search Catch all for Search Relevance Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-release Trigger CI checks against release build v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants