Skip to content

EQL: Convert wildcards to LIKE in analyzer#51901

Merged
rw-access merged 4 commits intoelastic:masterfrom
rw-access:eql/wildcards
Mar 6, 2020
Merged

EQL: Convert wildcards to LIKE in analyzer#51901
rw-access merged 4 commits intoelastic:masterfrom
rw-access:eql/wildcards

Conversation

@rw-access
Copy link
Copy Markdown
Contributor

@rw-access rw-access commented Feb 5, 2020

Addressing the comment thread from #51558 (comment).

Added ReplaceWildcards to the optimizer which detects the == "wild*card*" or != "wild*card*" patterns and replaces with LIKE.

This is branched from #51886, so only the last commit is relevant.

Update: Resolves #53104

@rw-access rw-access added the :Analytics/EQL EQL querying label Feb 5, 2020
@rw-access rw-access requested a review from costin February 5, 2020 00:36
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (:Search/EQL)

@costin costin requested review from astefan and matriv February 5, 2020 10:14
Copy link
Copy Markdown
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.
The PR will be easier to complete after #51929.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a wildcard appear in any type of string? e.g. some*glob?
I wonder whether the parser could detect it so instead of having Literal that might a string, it could have its own expression rule.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential improvement is to check whether an expression is foldable instead of being a literal.
Thus if string concatenation were to be added, the rule would still be applied:

if (e.foldable() && e.fold() instanceof String) {
    return e.fold().toString().contains("*");
}

which can be transformed into a one-liner:

return e.foldable() && e.fold() instanceof String && e.fold().toString().contains("*");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about foo > "wild*card" or other value comparisons?
If that's valid grammar, the verifier should pick the pattern and fail the query.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is valid grammar, but since it's not == or !=, this will just be a lexicographical comparison

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean the operators are not consistent since == would expand the wildcard while > & co would compare against it...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isWildcard already does the checks so simply do: eq.fold().toString()

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment regarding tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be possible to add more tests that also look at scenarios involving escape characters and all types of string that eql supports?
Can the * be escaped? If so, we should have a test covering this case.

Copy link
Copy Markdown
Contributor Author

@rw-access rw-access Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can add more tests. And there's no escape for *. But if you want to perform an exact (+/- case sensitivity) comparison, you can put the wildcard on the left (#51901 (comment)). Whether this functionality is good or not is a fair question, and I think it's fair to change this because I doubt any users are aware of the workaround.

field == "*wildcard*" <==> field LIKE "%wildcard%"
"*wildcard*" == field <==> field == '*wildcard*'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I find this more like a bug, rather than feature (that implies an workaround).
A == B and B == A should be equivalent imo. Equality is not a predicate (like LIKE) where a certain element sits on the right and another one sits on the left and they are not interchangeable.

@costin
Copy link
Copy Markdown
Member

costin commented Feb 7, 2020

I think it's worth discussing the feature and then figure out the implementation for it.

  1. if == and != are overloaded with the special string with wildcard, the behavior should be consistent with the existing rules.
  2. the overloading is dropped in favor of a dedicated construct (LIKE) which can introduce its own semantics and syntax. The downside here is backwards compatibility.

@rw-access
Copy link
Copy Markdown
Contributor Author

Agreed that wildcards should be made commutative. I don't think we should add a new construct like LIKE just yet. In the future, I think it could make sense, after we go through the usual deprecation steps.

I think the main question I have is -- should this be a separate rule in the grammar, or should it be handled in the analyzer?

@costin
Copy link
Copy Markdown
Member

costin commented Feb 10, 2020

If the wildcard can be picked up by the parser through a dedicated rule, I would opt for that.
However I don't see an easy way to do this in the grammar nor the parser itself and considering its commutative nature I would handle it in an optimizer rule.

@andrei how close are you to moving the optimizer rules in?

@astefan
Copy link
Copy Markdown
Contributor

astefan commented Feb 11, 2020

@andrei how close are you to moving the optimizer rules in?

@costin #52172

@rw-access rw-access requested a review from costin February 19, 2020 16:07
Copy link
Copy Markdown
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -33,6 +43,7 @@ protected Iterable<RuleExecutor<LogicalPlan>.Batch> batches() {
new BooleanSimplification(),
new BooleanLiteralsOnTheRight(),
// needs to occur before BinaryComparison combinations
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems incorrect as it referred to PropagateEquals

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where should ReplaceWildcards() be moved to?

@costin
Copy link
Copy Markdown
Member

costin commented Mar 4, 2020

elasticsearch-ci/2 is failing due to a checkstyle violation.

@rw-access rw-access merged commit 51adeaa into elastic:master Mar 6, 2020
rw-access added a commit that referenced this pull request Mar 6, 2020
* EQL: Convert wildcard comparisons to Like
* EQL: Simplify wildcard handling, update tests
* EQL: Lint fixes for Optimizer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/EQL EQL querying

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EQL: Wildcard matching doesn't work

5 participants