EQL: Add Substring function with Python semantics#53688
EQL: Add Substring function with Python semantics#53688costin merged 8 commits intoelastic:masterfrom
Conversation
Initial function added to act as a template. Does not reuse substring from SQL due to the difference in semantics and the accepted arguments. Currently it is missing any integration test as, the usage of scripting, requires an actual integration test against a proper cluster (and likely its own QA project).
|
Pinging @elastic/es-search (:Search/EQL) |
...src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/StringUtils.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/StringUtils.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/StringUtils.java
Show resolved
Hide resolved
astefan
left a comment
There was a problem hiding this comment.
Nice step forward. Left few comments.
| while (end < 0) { | ||
| end += length; | ||
| } | ||
| int validEndIndex = length; |
There was a problem hiding this comment.
I don't think you need validEndIndex, do you?
| while (start < 0) { | ||
| start += length; | ||
| } | ||
| while (end < 0) { |
There was a problem hiding this comment.
Could this while be replaced with this simple math?
if (end < 0) {
end = end - end/length * length + (end % length != 0 ? length : 0);
}
There was a problem hiding this comment.
:-) instead of subtracting length each time and then check if the value is still negative, the formula above subtracts the entire amount once by dividing the end value by length (to see how many lengths "fit" in "end") and taking the whole part only (not also the decimals), multiplying by length and adding a possible remainder (defined as end % length).
| private final Expression source, start, end; | ||
|
|
||
| public Substring(Source source, Expression src, Expression start, Expression end) { | ||
| super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start)); |
There was a problem hiding this comment.
If there is no end, shouldn't end be treated as the end of the string itself, meaning source's length?
| public Substring(Source source, Expression src, Expression start, Expression end) { | ||
| super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start)); | ||
| this.source = src; | ||
| this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start; |
There was a problem hiding this comment.
I think here you mean to use start == null, no?
| super(source, end != null ? Arrays.asList(src, start, end) : Arrays.asList(src, new Literal(source, 0, DataTypes.INTEGER), start)); | ||
| this.source = src; | ||
| this.start = end == null ? new Literal(source, 0, DataTypes.INTEGER) : start; | ||
| this.end = end == null ? start : end; |
There was a problem hiding this comment.
If end == null shouldn't end be the length of the string itself?
| if (start == null || end == null) { | ||
| return source; | ||
| } | ||
| if (!(start instanceof Number)) { |
| if (!(end instanceof Number)) { | ||
| throw new EqlIllegalArgumentException("A number is required; received [{}]", end); | ||
| } | ||
| if (((Number) end).intValue() < 0) { |
There was a problem hiding this comment.
I thought end can be negative?
|
|
||
| InternalEqlScriptUtils() {} | ||
|
|
||
|
|
| assertEquals("1:15: Unknown function [add]", | ||
| error("process where add(serial_event_id, 0) == 1")); | ||
| assertEquals("1:15: Unknown function [subtract]", | ||
| assertEquals("1:15: Unknown function [subtract], did you mean [substring]?", |
|
|
||
| import static org.elasticsearch.xpack.eql.expression.function.scalar.string.StringUtils.substringSlice; | ||
|
|
||
| public class StringUtilsTests extends ESTestCase { |
There was a problem hiding this comment.
No tests with null values?
|
I've updated the algorithm since my understanding of sequence based on the docs was not correct; so I experimented with the Python REPL and came up with the above. |
matriv
left a comment
There was a problem hiding this comment.
Left also some comments mostly minor.
| import java.util.Map; | ||
|
|
||
| public abstract class InternalQlScriptUtils { | ||
| public class InternalQlScriptUtils { |
There was a problem hiding this comment.
No because it is extended by the other Script classes.
| @Override | ||
| protected Collection<Class<? extends Plugin>> nodePlugins() { | ||
| return Collections.singletonList(LocalStateEqlXPackPlugin.class); | ||
| return Arrays.asList(LocalStateEqlXPackPlugin.class); |
| if (end < 0) { | ||
| end = 0; | ||
| } | ||
| if (end > length) { |
There was a problem hiding this comment.
This check couldn't be included inside the
if (end < 0)
?
There was a problem hiding this comment.
No because once checks the lower limit, whether end is still negative (after potentially adding length) while the other checks the upper limit.
| return start; | ||
| } | ||
|
|
||
| public Pipe length() { |
There was a problem hiding this comment.
Here also, shouldn't be end?
| assertEquals(str.substring(5, 9), substringSlice(str, -5, -1)); | ||
| } | ||
|
|
||
| public void testSubstringSliceNegativeOverLength() { |
There was a problem hiding this comment.
How about both negative and both higher than length?
Also a test for both positive and both higher than end.
There was a problem hiding this comment.
That's what the test does - -15 and -11 are both negative and higher than length.
Am I missing something?
There was a problem hiding this comment.
Sorry my bad about the negative.
| assertEquals(str.substring(0, 10 - 1), substringSlice(str, -start, -1)); | ||
| } | ||
|
|
||
| public void testEndHigherThanLenght() { |
There was a problem hiding this comment.
| public void testEndHigherThanLenght() { | |
| public void testEndHigherThanLength() { |
astefan
left a comment
There was a problem hiding this comment.
LGTM. As a side note (and a separate PR), I think we need to add NodeSubclassTests to EQL, similar to QL's and SQL's corresponding classes.
| } | ||
|
|
||
| public void testNullValue() { | ||
| assertNull(substringSlice(null, 0, 0)); |
There was a problem hiding this comment.
Maybe random ints (negative too) for completeness?
There was a problem hiding this comment.
Those are covered in testSubstringRandomSliceNegative
Improve separation of scripting between EQL and SQL by delegating common methods to QL. The context detection is determined based on the package to avoid having repetitive class hierarchies. Relates elastic#53688
Improve separation of scripting between EQL and SQL by delegating common methods to QL. The context detection is determined based on the package to avoid having repetitive class hierarchies. The Painless whitelists have been improved so that the declaring class is used instead of the inherited one. Relates #53688
Improve separation of scripting between EQL and SQL by delegating common methods to QL. The context detection is determined based on the package to avoid having repetitive class hierarchies. The Painless whitelists have been improved so that the declaring class is used instead of the inherited one. Relates #53688 (cherry picked from commit 6d46033) EQL: Add Substring function with Python semantics (#53688) Does not reuse substring from SQL due to the difference in semantics and the accepted arguments. Currently it is missing full integration tests as, due to the usage of scripting, requires an actual integration test against a proper cluster (and likely its own QA project). (cherry picked from commit f58680b)
Initial function added to act as a template.
Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing any integration test as, the usage of scripting,
which might require an actual integration test against a proper cluster (and
likely its own QA project).