Skip to content

EQL: implement between function#54277

Merged
aleksmaus merged 15 commits intoelastic:masterfrom
aleksmaus:feature/between
Apr 7, 2020
Merged

EQL: implement between function#54277
aleksmaus merged 15 commits intoelastic:masterfrom
aleksmaus:feature/between

Conversation

@aleksmaus
Copy link
Copy Markdown
Contributor

@aleksmaus aleksmaus commented Mar 26, 2020

Related to #54135

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-ql (:Query Languages/EQL)

@aleksmaus aleksmaus removed the WIP label Mar 26, 2020
@aleksmaus aleksmaus marked this pull request as ready for review March 26, 2020 20:01
@aleksmaus aleksmaus requested a review from imotov March 28, 2020 19:05
Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments.

* you may not use this file except in compliance with the Elastic License.
*/

package org.elasticsearch.xpack.eql.expression.function.scalar.between;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this package name? :-)
Shouldn't this be the same as for other string functions? org.elasticsearch.xpack.eql.expression.function.scalar.string

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Will move everything under scalar.string package.


public class BetweenFunctionProcessor implements Processor {

public static final String NAME = "sbetween";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sbtw is shorter and serves the same purpose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

return (String) SubstringFunctionProcessor.doProcess(s, start, end);
}

public static String between(String s, String left, String right, Boolean greedy, Boolean caseSensitive) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this one before substring (to follow the alphabetical order), please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# ASCII Functions
#
String substring(String, Number, Number)
String between(String, String, String, Boolean, Boolean)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (alphabetical).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


import java.util.concurrent.Callable;

public abstract class BetweenBaseTestCase extends ESTestCase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let other reviewers weigh in on the necessity of this approach here.

That isn't an usual approach with randomized testing. The Elasticsearch infra in CI does run the tests continuously and having a randomized value for parameters is usually enough.
A more common approach is to introduce randomness in tests with various random* methods and leave CI do its thing. A personal approach (that others I believe are following) is to stress test such a random test class with -Dtests.iters=1000 or such a large value before creating the PR. I'm not saying this is enough, but it gives one a greater degree of confidence over the committed code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the approach utilized by serializations tests, it's just extracted into the base class there, which we could consolidate across functions tests. Easy to remove. Let me know.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the class and the multiple tests runs from randomized tests.


assertThat(BetweenUtils.between("System Idle Process", "s", "e", false, false),
equalTo("yst"));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test with last parameter set to true.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. Added more tests.



// Translation table for error messaging in the following function
private static final String[] NUM_NAMES = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go somewhere at the top of this class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the top of this class.

if (hasOptionalParams && (children.size() > NUM_TOTAL_PARAMS || children.size() < NUM_TOTAL_PARAMS - numOptionalParams)) {
throw new QlIllegalArgumentException("expects between " + NUM_NAMES[NUM_TOTAL_PARAMS - numOptionalParams]
+ " and " + NUM_NAMES[NUM_TOTAL_PARAMS] + " arguments");
} else if (!hasOptionalParams && children.size() != NUM_TOTAL_PARAMS) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasOptionalParams == false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

throw new QlIllegalArgumentException("expects between " + NUM_NAMES[NUM_TOTAL_PARAMS - numOptionalParams]
+ " and " + NUM_NAMES[NUM_TOTAL_PARAMS] + " arguments");
} else if (!hasOptionalParams && children.size() != NUM_TOTAL_PARAMS) {
throw new QlIllegalArgumentException("expects exactly " + NUM_NAMES[NUM_TOTAL_PARAMS] + " arguments");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this wants to be a generic approach for dealing with a variable number of optional parameters, but:

  • this solution needs to be generic throughout this class: here it's another method accepting optional parameters.
  • I don't understand the purpose of this method... it tries to be generic, but NUM_TOTAL_PARAMS is always 5. So, this method is for functions that can have between 0 and 5 optional parameters? Apart from between is there any other such function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach by the here link in you comment allows only one optional parameter.
Here I created a kind of more generic implementation allowing to specify the number of optional parameters.
In the case of between function we need to support up to two optional parameters.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revisit this based on whether between actually needs case sensitivity or not - if it doesn't it will simplify things.
Same applies for greedy.

/cc @rw-access

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the greedy is not quite the same as case sensitivity, could be useful.

@@ -0,0 +1,11 @@
[[queries]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just add these to test_queries_supported.toml?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can rename the file and we could start using it for all the extra queries that are not included in the original test_queries.toml. Will do that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the file into test_queries_supported.toml so we can add more queries there to avoid touching the original EQL test_queries.toml file.

}

public static Object doProcess(Object source, Object left, Object right, Object greedy, Object caseSensitive) {
if (source == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if left and right are null?

Suggested change
if (source == null) {
if (source == null || left == null || right == null) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Across functions that I was implementing so far I was assuming that we are returning null only if the source data (first param) is null.
The rest of params values should provided or there there will an exception, which is consistent with Java strings function overall that throw when the parameter is null.
Saw another approach was used by @astefan where was returning null if any of params are null.
Open for discussion, but would need to make consistent across of all of the functions.
Haven't heard a definite decision on this yet.

@aleksmaus aleksmaus requested review from astefan and rw-access April 6, 2020 12:54
Copy link
Copy Markdown
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments otherwise LGTM

}

private static void throwIfNotString(Object obj) {
if (!(obj instanceof String || obj instanceof Character)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth extracting these assertions into the Check class inside QL and reusing them through-out the code.
We've been going back and forth regarding this style - the if check is fairly simple but repeating the string is tedious (and typically tends to not be consistent).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I tried to avoid repetition of code and error strings. Moved these into QL Check class.



@Override
public boolean equals(Object obj) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit - the getNameWritable tends to be towards the top after writeTo and hashCode before equals.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like getWriteableName presently it is at the end of every FunctionProcessor so far.
Updated.


int idx = matchString.indexOf(left);
if (idx == -1) {
return "";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could return org.elasticsearch.xpack.ql.util.StringUtils.EMPTY instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


package org.elasticsearch.xpack.eql.expression.function.scalar.string;

import org.apache.directory.api.util.Strings;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong import.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you! Found another place in tests. Updated both places.

}

// Test from EQL doc https://eql.readthedocs.io/en/latest/query-guide/functions.html
public void testBetweenBasicEQLExamples() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

throw new QlIllegalArgumentException("expects between " + NUM_NAMES[NUM_TOTAL_PARAMS - numOptionalParams]
+ " and " + NUM_NAMES[NUM_TOTAL_PARAMS] + " arguments");
} else if (!hasOptionalParams && children.size() != NUM_TOTAL_PARAMS) {
throw new QlIllegalArgumentException("expects exactly " + NUM_NAMES[NUM_TOTAL_PARAMS] + " arguments");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revisit this based on whether between actually needs case sensitivity or not - if it doesn't it will simplify things.
Same applies for greedy.

/cc @rw-access

}

private static Expression toDefault(Expression exp) {
return exp != null ? exp : Literal.FALSE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to create a new exception preservation the source - helps with tracking down the location of the argument:
new Literal(exp.source(), Boolean.FALSE, DataTypes.Boolean)

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@costin
Copy link
Copy Markdown
Member

costin commented Apr 6, 2020

Could you please incorporate #54795? It should remove the need of an optimizer rule and make the function class smaller.

Scratch that - I mixed up the issues.

compile project(':test:framework')
compile project(path: xpackModule('core'), configuration: 'default')
compile project(path: xpackModule('core'), configuration: 'testArtifacts')
compile project(path: xpackModule('ql'), configuration: 'default')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this import is just for CollectionUtils, I'd rather keep the projects separated and simply concatenate the lists through addAll.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed.

@aleksmaus aleksmaus merged commit b7f02d8 into elastic:master Apr 7, 2020
aleksmaus added a commit to aleksmaus/elasticsearch that referenced this pull request Apr 7, 2020
* EQL: implement between function

* Address WIP TODOs. Add more tests

* Fix linter complaint

* Add more query folder tests to cover invalid parameter types

* Update Between toDefault. Fix typo in BetweenFunctionProcessor. Add more tests.

* Remove unneeded null checks in BetweenFunctionProcessor::doProcess

* Address code review comments

* Address additional code review comments

* Remove dependency on QL from eql/qa project
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants