Skip to content

Search in ES|QL: Add MATCH operator#110971

Merged
ioanatia merged 15 commits intoelastic:mainfrom
ioanatia:match_operator
Jul 26, 2024
Merged

Search in ES|QL: Add MATCH operator#110971
ioanatia merged 15 commits intoelastic:mainfrom
ioanatia:match_operator

Conversation

@ioanatia
Copy link
Copy Markdown
Member

@ioanatia ioanatia commented Jul 17, 2024

Part of #110131
Adds a MATCH operator, e.g.:

FROM search-movies
| WHERE title MATCH "harry potter"
| KEEP title

A few notes:

  • I did not add csv tests because the push down of the match predicate to Lucene is done with LocalPhysicalPlanOptimizer and csv tests do no execute the rules from LocalPhysicalPlanOptimizer. To test the push down of MATCH we used integration tests in MatchOperatorPushDownIT Andrei pointed out we have 2 different types of CSV tests - tracking this as a follow up in Full-text search in ES|QL #110131
  • I tried to find as many places where failures can occur when using MATCH and catch them early at the Verifier level so that we can return a VerificationException rather than have this fail at the compute engine level.
  • If there is a better place to push down the match predicates to the lucene query let me know, for now this happens in LocalPhysicalPlanOptimizer
  • I don't claim I patched all the ways this could fail deeper at the data notes level or compute engine - when I did not find an obvious fix, I added an integration test in MatchOperatorPushDownIT and marked with a TODO so we don't forget about it. Full-text search in ES|QL #110131 has a separate item to ensure that we have graceful failure modes for both the MATCH operator and MATCH command

@ioanatia ioanatia added :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels Jul 17, 2024
@ioanatia ioanatia mentioned this pull request Jul 17, 2024
14 tasks
@javanna javanna added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@astefan astefan added the :Analytics/ES|QL AKA ESQL label Jul 18, 2024
@ioanatia ioanatia marked this pull request as ready for review July 18, 2024 14:19
@ioanatia ioanatia requested a review from astefan July 18, 2024 14:19
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Jul 18, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Meta label for search team label Jul 18, 2024
""";

var error = expectThrows(VerificationException.class, () -> run(query));
assertThat(error.getMessage(), containsString("Unknown column [something]"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like and IT on the pushdown - more like on MATCH itself. I might rename the class.

If you want to test the pushdown I might try using the profile option on the request and looking for the LuceneSourceOperator's status. It should have a list of queries it run and that should tell you that you pushed down.

I think you could do all of this with an ESRestTestCase if you wanted which gives you the advantage of being able to run it against many versions later.

For some of these failures I think I'd prefer them in the yaml test cases. They run against multiple versions by default and they have to be pretty simple so it can be easier to read them. Just because you know they can't surprise you.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, also, be ok with the tests in VerifierTests (the VerificationException ones).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added yaml tests as well - I liked these these tests because I can easily enable logging and debug in the IDE - I find it easier to do that here than in yaml or CSV specs.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @ioanatia, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it takes longer, I would like this feature to be merged with csv-spec tests as well. It's something we usually do with features, bug-fixes, enhancements in ESQL.

""";

var error = expectThrows(VerificationException.class, () -> run(query));
assertThat(error.getMessage(), containsString("Unknown column [something]"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, also, be ok with the tests in VerifierTests (the VerificationException ones).

@ioanatia ioanatia requested review from a team as code owners July 22, 2024 14:31
@ioanatia ioanatia removed request for a team July 22, 2024 14:36
@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Jul 22, 2024

Even if it takes longer, I would like this feature to be merged with csv-spec tests as well. It's something we usually do with features, bug-fixes, enhancements in ESQL.

I think we only have two options for csv-spec tests:

  1. Don't run these csv-spec tests in CsvTests via one of those assumeFalsees for a capability. So they'd only run in the big integration tests. Like with ./gradlew -p x-pack/plugin/esql/qa/server/single-node check.
  2. Convert the CsvTests thing to build a Lucene index. That seems like a lot. Maybe not a bad idea, but a lot.

I opened #111157 to see if we could run these at runtime, but that's going to take some time to review and integrate into this. Also, it still needs a lucene index. I think we probably should skip the csv-spec tests in CsvTests.

Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want CSV tests, I think this is the shortest route to them: https://gist.github.com/nik9000/38946d2ba2639db64fe5ee28d0b805a7

It adds a capability that's not supported in older versions and so the tests can be skipped.

- "No limit defined, adding default limit of \\[.*\\]"
esql.query:
body:
query: 'FROM test | WHERE content MATCH "fox" | KEEP id | SORT id'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to fail without a capability, I think.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. You check the cluster feature already. I think we're better off checking capabilities. We're just adding too many things to do it all with features.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated to use a capability. 👍

@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Jul 24, 2024

If we want CSV tests, I think this is the shortest route to them: https://gist.github.com/nik9000/38946d2ba2639db64fe5ee28d0b805a7

Wait. Ignore this. you did it.

- "No limit defined, adding default limit of \\[.*\\]"
esql.query:
body:
query: 'FROM test | WHERE content MATCH "fox" | KEEP id | SORT id'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. You check the cluster feature already. I think we're better off checking capabilities. We're just adding too many things to do it all with features.


- match: { status: 400 }
- match: { error.type: verification_exception }
- match: { error.reason: "Found 1 problem\nline 1:19: MATCH requires a text or keyword field, but [id] has type [integer]" }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking my understanding - do we want MATCH to skip non-string fields in the future? Like, if I do WHERE int MATCH "1" should that fail? Or match nothing? Or match the literal 1? It's all good to match whatever makes sense in this PR though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our aim is to add full text search capabilities and not necessarily have full parity with the match query - this is why I restricted this to text and keyword for now - if we want to add support for data, boolean and numeric fields we can definitely do that later. not in this PR though

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching on keyword and text seems reasonable. It's not clear to me how much value there is matching on more field types, but I agree that that can be discussed separately.

Copy link
Copy Markdown
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding my approval to this PR. It LGTM, testing is comprehensive and all outstanding comments have been addressed.

While not strictly necessary, it would be good to have another approval, from someone who has commented on the PR.

@ChrisHegarty ChrisHegarty requested a review from nik9000 July 25, 2024 09:58
@ioanatia
Copy link
Copy Markdown
Member Author

leaving this open for a while longer unless there are any other comments - otherwise I will be merging this later today.

Copy link
Copy Markdown
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

…action/EsqlCapabilities.java

Co-authored-by: Iván Cea Fontenla <ivancea96@outlook.com>
@ioanatia ioanatia merged commit 8df1c50 into elastic:main Jul 26, 2024
@ChrisHegarty ChrisHegarty added the ES|QL-ui Impacts ES|QL UI label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI >feature :Search Relevance/Search Catch all for Search Relevance Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants