Skip to content

EQL: Add Substring function with Python semantics#53688

Merged
costin merged 8 commits intoelastic:masterfrom
costin:eql/functions-nointeg
Mar 19, 2020
Merged

EQL: Add Substring function with Python semantics#53688
costin merged 8 commits intoelastic:masterfrom
costin:eql/functions-nointeg

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Mar 17, 2020

Initial function added to act as a template.
Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing any integration test as, the usage of scripting,
which might require an actual integration test against a proper cluster (and
likely its own QA project).

Initial function added to act as a template.
Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing any integration test as, the usage of scripting,
requires an actual integration test against a proper cluster (and likely
its own QA project).
@costin costin added the :Analytics/EQL EQL querying label Mar 17, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (:Search/EQL)

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice step forward. Left few comments.

while (end < 0) {
end += length;
}
int validEndIndex = length;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need validEndIndex, do you?

while (start < 0) {
start += length;
}
while (end < 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this while be replaced with this simple math?

if (end < 0) {
   end = end - end/length * length + (end % length != 0 ? length : 0);
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No so simple :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) instead of subtracting length each time and then check if the value is still negative, the formula above subtracts the entire amount once by dividing the end value by length (to see how many lengths "fit" in "end") and taking the whole part only (not also the decimals), multiplying by length and adding a possible remainder (defined as end % length).

private final Expression source, start, end;

public Substring(Source source, Expression src, Expression start, Expression end) {
super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no end, shouldn't end be treated as the end of the string itself, meaning source's length?

public Substring(Source source, Expression src, Expression start, Expression end) {
super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start));
this.source = src;
this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you mean to use start == null, no?

super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start));
this.source = src;
this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start;
this.end = end == null ? start : end;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end == null shouldn't end be the length of the string itself?

if (start == null || end == null) {
return source;
}
if (!(start instanceof Number)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== false instead of !?

if (!(end instanceof Number)) {
throw new EqlIllegalArgumentException("A number is required; received [{}]", end);
}
if (((Number) end).intValue() < 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought end can be negative?


InternalEqlScriptUtils() {}


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra empty line.

assertEquals("1:15: Unknown function [add]",
error("process where add(serial_event_id, 0) == 1"));
assertEquals("1:15: Unknown function [subtract]",
assertEquals("1:15: Unknown function [subtract], did you mean [substring]?",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic


import static org.elasticsearch.xpack.eql.expression.function.scalar.string.StringUtils.substringSlice;

public class StringUtilsTests extends ESTestCase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests with null values?

@costin
Copy link
Copy Markdown
Member Author

costin commented Mar 18, 2020

I've updated the algorithm since my understanding of sequence based on the docs was not correct; so I experimented with the Python REPL and came up with the above.

Copy link
Copy Markdown
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left also some comments mostly minor.

import java.util.Map;

public abstract class InternalQlScriptUtils {
public class InternalQlScriptUtils {
Copy link
Copy Markdown
Contributor

@matriv matriv Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be also final ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because it is extended by the other Script classes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, sorry.

@Override
protected Collection<Class<? extends Plugin>> nodePlugins() {
return Collections.singletonList(LocalStateEqlXPackPlugin.class);
return Arrays.asList(LocalStateEqlXPackPlugin.class);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover.

if (end < 0) {
end = 0;
}
if (end > length) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check couldn't be included inside the

if (end < 0)

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because once checks the lower limit, whether end is still negative (after potentially adding length) while the other checks the upper limit.

return start;
}

public Pipe length() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, shouldn't be end?

assertEquals(str.substring(5, 9), substringSlice(str, -5, -1));
}

public void testSubstringSliceNegativeOverLength() {
Copy link
Copy Markdown
Contributor

@matriv matriv Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about both negative and both higher than length?
Also a test for both positive and both higher than end.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the test does - -15 and -11 are both negative and higher than length.
Am I missing something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad about the negative.

assertEquals(str.substring(0, 10 - 1), substringSlice(str, -start, -1));
}

public void testEndHigherThanLenght() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testEndHigherThanLenght() {
public void testEndHigherThanLength() {

@astefan astefan self-requested a review March 19, 2020 12:39
Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As a side note (and a separate PR), I think we need to add NodeSubclassTests to EQL, similar to QL's and SQL's corresponding classes.

}

public void testNullValue() {
assertNull(substringSlice(null, 0, 0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe random ints (negative too) for completeness?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are covered in testSubstringRandomSliceNegative

@costin costin merged commit f58680b into elastic:master Mar 19, 2020
@costin costin deleted the eql/functions-nointeg branch March 19, 2020 14:58
costin added a commit to costin/elasticsearch that referenced this pull request Mar 22, 2020
Improve separation of scripting between EQL and SQL by delegating common
methods to QL. The context detection is determined based on the package
to avoid having repetitive class hierarchies.

Relates elastic#53688
costin added a commit that referenced this pull request Mar 24, 2020
Improve separation of scripting between EQL and SQL by delegating common
methods to QL. The context detection is determined based on the package
to avoid having repetitive class hierarchies.
The Painless whitelists have been improved so that the declaring class is used 
instead of the inherited one.

Relates #53688
costin added a commit that referenced this pull request Mar 24, 2020
Improve separation of scripting between EQL and SQL by delegating common
methods to QL. The context detection is determined based on the package
to avoid having repetitive class hierarchies.
The Painless whitelists have been improved so that the declaring class
is used instead of the inherited one.

Relates #53688

(cherry picked from commit 6d46033)

EQL: Add Substring function with Python semantics (#53688)

Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing full integration tests as, due to the usage of
scripting, requires an actual integration test against a proper cluster
(and likely its own QA project).

(cherry picked from commit f58680b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/EQL EQL querying

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants