ESQL: LEAST and GREATEST functions#98630
Conversation
Adds `LEAST` and `GREATEST` functions to find the min or max of the values in many colunms.
|
Documentation preview: |
|
Hi @nik9000, I've created a changelog YAML for you. |
|
Pinging @elastic/es-ql (Team:QL) |
|
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
astefan
left a comment
There was a problem hiding this comment.
LGTM, but I believe more CSV tests are needed, especially for real (non-row command) data in employees table.
For example, from employees | eval min = least(still_hired), max = greatest(still_hired) | sort emp_no | limit 10 | keep still_hired, min, max fails with
{
"error": {
"root_cause": [
{
"type": "unsupported_operation_exception",
"reason": null
}
],
"type": "unsupported_operation_exception",
"reason": null
},
"status": 500
}
Wondering if a nicer error message isn't more appropriate. If I replace least and greatest with min and max I get something more informative:
{
"error": {
"root_cause": [
{
"type": "verification_exception",
"reason": "Found 2 problems\nline 1:29: argument of [min(still_hired)] must be [numeric or datetime], found value [min(still_hired)] type [boolean]\nline 1:53: argument of [max(still_hired)] must be [numeric or datetime], found value [max(still_hired)] type [boolean]"
}
],
"type": "verification_exception",
"reason": "Found 2 problems\nline 1:29: argument of [min(still_hired)] must be [numeric or datetime], found value [min(still_hired)] type [boolean]\nline 1:53: argument of [max(still_hired)] must be [numeric or datetime], found value [max(still_hired)] type [boolean]"
},
"status": 400
}
I think it would be useful to test the null default handling for multi-values. For example from employees | eval min = least(job_positions), max = greatest(job_positions) | sort emp_no | limit 10 | keep job_positions, min, max.
|
|
||
| least | ||
| // tag::least[] | ||
| ROW a = 10, b=20 |
There was a problem hiding this comment.
Whitespacing is inconsistent.
|
|
||
| greatest | ||
| // tag::greatest[] | ||
| ROW a = 10, b=20 |
There was a problem hiding this comment.
Same here about inconsistency.
| include::{esql-specs}/math.csv-spec[tag=greatest-result] | ||
| |=== | ||
|
|
||
| NOTE: When run on `keyword` or `text` fields, this'll return the last string |
There was a problem hiding this comment.
Is this about multi-values support?
There was a problem hiding this comment.
They are like most of our other functions at this point and don't support multivalued fields. This is a case where it'd make a lot of sense to take, like, mv_min for least.
There was a problem hiding this comment.
It's about what LEAST and GREATEST mean when you use them with string style fields. Let me see about clarifying.
alex-spies
left a comment
There was a problem hiding this comment.
Basically LGTM, I only wonder about handling/testing of some edge cases like empty varargs or multi-value fields.
| [[esql-greatest]] | ||
| === `GREATEST` | ||
|
|
||
| Returns the maximum value from many columns. |
There was a problem hiding this comment.
nit/question: does it make sense to mention behavior for multi-value columns? I'd be tempted to do something like
row a=[1,2,3], b = 0 | eval greatest(a,b)
and would expect to receive either 0 or an error, or null with a warning.
| static final Map<String, BiFunction<Source, List<Expression>, ScalarFunction>> VARARG_CTORS = Map.ofEntries( | ||
| entry(name(Case.class), Case::new), | ||
| entry(name(Coalesce.class), Coalesce::new), | ||
| entry(name(Greatest.class), Greatest::new), | ||
| entry(name(Least.class), Least::new) | ||
| ); |
There was a problem hiding this comment.
question (I'm not fully familiar with the code base yet, so this is going to sound naive): It seems like instead of a map, one could also define an enum representing the vararg functions, or maybe have them all implement an interface that is used by vararg functions. Why do we go with a map?
There was a problem hiding this comment.
They could all implement an interface, namely Writeable. Then we wouldn't have any of these. But we ended up going this way to prevent us from having to implement that interface on QL code. To be honest, I wonder if, now that we've merged to main, we should back that out.
There was a problem hiding this comment.
Implementing Writable means touching all the QL code (expression, attribute, etc..) instead of having one class that handles all the messy details of writing and another for reading.
| /** | ||
| * Builds test cases for variable argument functions. | ||
| */ | ||
| public class VaragsTestCases { |
There was a problem hiding this comment.
naming bikeshed (might be inconsistent with other names, is meant absolutely optional): Wouldn't VarargsTestCaseBuilder be more precise? At first glance, I thought that an instance of this represents a fixed test case, rather than supplying an arbitrary number of test cases.
There was a problem hiding this comment.
VarargsTestCaseBuilder is more precise I think.
...l/src/test/java/org/elasticsearch/xpack/esql/expression/function/scalar/VaragsTestCases.java
Outdated
Show resolved
Hide resolved
|
|
||
| import static org.hamcrest.Matchers.equalTo; | ||
|
|
||
| public class GreatestTests extends AbstractFunctionTestCase { |
There was a problem hiding this comment.
nit: maybe a test case would be nice that documents behavior unter multi-value fields, even if it just documents that multi-value fueilds results in an error/warning.
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/math/Greatest.java
Show resolved
Hide resolved
|
|
||
| @Evaluator(extraName = "Int") | ||
| static int process(int[] values) { | ||
| int max = values[0]; |
There was a problem hiding this comment.
nit: can't this be out of bounds? Not sure the case is handled when the number of varargs is zero.
There was a problem hiding this comment.
The function requires at least one argument. Or, at least, it should. Let me make sure there's a test for that case.
There was a problem hiding this comment.
Looks like I've made this error a few times!
|
|
||
| g:integer | ||
| 1234 | ||
| ; |
There was a problem hiding this comment.
question/nit: do we allow varargs functions to take 0 arguments? IMHO, a test case for this would be good, either as int test (if it's allowed) or as unit test (if it's supposed to error out).
There was a problem hiding this comment.
I'm adding some more tests asserting that we don't support 0 arguments. Actually, none of our varargs functions should allow 0 arguments. That requires a bit of fiddling on the constructors which I'll do.
|
I'd also add one more test that's a bit more complex: |
nik9000
left a comment
There was a problem hiding this comment.
@alex-spies and @astefan this is ready for another round I think! I've added multivalue fields to the VarargsTestCaseBuilder. I've added a little bit to the docs. But just a little. Let me know what you think!
alex-spies
left a comment
There was a problem hiding this comment.
LGTM! Only left remarks/questions that are all meant to be highly optional.
| import java.util.function.Function; | ||
| import java.util.function.Supplier; | ||
| import java.util.stream.IntStream; | ||
| import java.util.stream.Stream; |
There was a problem hiding this comment.
praise: yay for complie-time enforcement of at least 1 argument.
| ); | ||
| } | ||
| throw new UnsupportedOperationException(); | ||
| throw new QlIllegalArgumentException("unsupported type [" + dataType + "]"); |
There was a problem hiding this comment.
nit: you can use EsqlIllegalArgumentException::illegalDataType, although this code should likely never be reached, anyway.
There was a problem hiding this comment.
I didn't know this was there. Happy to use it.
There was a problem hiding this comment.
@alex-spies Better to move that utility method from the exception class to an util class specialized for the topic - see TypeResolutions/ SqlTypeResolution(we would have to make on for ESQL) orEsqlDataTypes`
...rc/test/java/org/elasticsearch/xpack/esql/expression/function/scalar/math/GreatestTests.java
Show resolved
Hide resolved
| @Override | ||
| protected Greatest build(Source source, List<Expression> args) { | ||
| return new Greatest(Source.EMPTY, args.stream().toList()); | ||
| return new Greatest(Source.EMPTY, args.get(0), args.subList(1, args.size())); |
There was a problem hiding this comment.
nit: maybe worth adding a comment why the source argument is unused/ignored.
There was a problem hiding this comment.
We shouldn't ignore it. It doesn't really add much to the test, but if we have it we should pass it.
...c/test/java/org/elasticsearch/xpack/esql/expression/function/scalar/nulls/CoalesceTests.java
Show resolved
Hide resolved
| return this; | ||
| } | ||
|
|
||
| public VaragsTestCaseBuilder expectBoolean(Function<Stream<boolean[]>, Optional<boolean[]>> expectedBoolean) { |
There was a problem hiding this comment.
thought: I wonder if we could reduce the code duplication in this file via generics, although that may be more pain than is worth it given that we probably need unboxed primitives.
| |=== | ||
|
|
||
| NOTE: When run on `keyword` or `text` fields, this'll return the first string | ||
| in alphabetical order. When run on `boolean` columns this will return |
There was a problem hiding this comment.
I think this needs to be "... this will return false if any values are false".
| 10 | ||
| ; | ||
|
|
||
| leastGreatestMany |
| * 2.0. | ||
| */ | ||
|
|
||
| package org.elasticsearch.xpack.esql.expression.function.scalar.math; |
There was a problem hiding this comment.
Doesn't really matter, but the package this is in is kind of an unusual choice. Is not really a great option, but if I would have to choose I'd go with org.elasticsearch.xpack.esql.expression.function.scalar.conditional instead.
There was a problem hiding this comment.
I think it is like max, I can move it.
|
Hi @nik9000, I've created a changelog YAML for you. |
Adds
LEASTandGREATESTfunctions to find the min or max of the values in many columns.