SQL: Implement scripting inside aggs#55241
Conversation
|
Pinging @elastic/es-ql (:Query Languages/SQL) |
There was a problem hiding this comment.
So this code becomes:
Object argument = asFieldOrLiteralOrScript(c.field());
if (c.distinct()) {
return new CardinalityAgg(id, argument);
} else {
return new FilterExistsAgg(id, argument);
}1ae462d to
8bbfaa1
Compare
Implement the use of scalar functions inside aggregate functions. This allows for complex expressions inside aggregations, with or without GROUBY as well as with or without a HAVING clause. e.g.: ``` SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b FROM test GROUP BY b HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5 ``` Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as this is currently not implemented on the ElasticSearch side. Fixes: elastic#29980 Fixes: elastic#36865 Fixes: elastic#37271
costin
left a comment
There was a problem hiding this comment.
Looks good.
Left some comments regarding testing and potentially improvements in the agg hierarchy- if they are unclear, we can address that through a different PR.
| DataTypes.BOOLEAN); | ||
| } | ||
|
|
||
| public static ScriptTemplate isNotNullCardinality(ScriptTemplate script) { |
There was a problem hiding this comment.
This method seems to be used in one place as such extracting it in a different class doesn't add much value since there's no reuse.
The name also indicates that - a method specific to cardinality/count.
|
|
||
| // aggregates with scalars | ||
| aggregateFunctionsWithScalars | ||
| SELECT MAX(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "max", |
|
|
||
|
|
||
| // aggregates with scalars | ||
| aggregateFunctionsWithScalars |
There was a problem hiding this comment.
Can you please add a couple of tests where the aggs with scalars are inside having?
Even better have scalars over aggs over scalars inside Having.
Also another that sorts by an aggregation with an internal scalar.
There was a problem hiding this comment.
Can you please add a couple of tests where the aggs with scalars are inside having?
There is already:
Even better have scalars over aggs over scalars inside Having.
I will add one with scalars also on top of the agg function.
Also another that sorts by an aggregation with an internal scalar.
I have already: https://github.com/elastic/elasticsearch/pull/55241/files/b8bc29f350f24b22bde8b9096a95d4273dd822fe#diff-002044c6b4e1de5a6960847971f85532R960
There was a problem hiding this comment.
I have already:
I was thinking of a test with the agg declared just inside the order and not referencing an already declared aggregation.
| static String field(AggregateFunction af) { | ||
| Expression arg = af.field(); | ||
| if (arg.foldable()) { | ||
| return String.valueOf(((Literal) arg).value()); |
There was a problem hiding this comment.
String.valueOf(arg.fold())
There's rarely a reason to cast to a Literal
| } | ||
|
|
||
| private static Object getFieldOrLiteralOrScript(AggregateFunction af) { | ||
| if (isFieldOrLiteral(af)) { |
There was a problem hiding this comment.
return isFieldOrLiteral(af) ? field(af) : ((ScalarFunction) af.field()).asScript()
There was a problem hiding this comment.
The logic from topAggsFieldOrScript and this method is similar and should be reused - I like the instanceof check for Scalar instead of the direct cast; a bit verbose but safe.
| return af.field().foldable() || af.field() instanceof FieldAttribute; | ||
| } | ||
|
|
||
| private static Object getFieldOrLiteralOrScript(AggregateFunction af) { |
There was a problem hiding this comment.
get prefix is for getter - use something like toField.. or asField..
| } | ||
|
|
||
| private static String topAggsField(AggregateFunction af, Expression e) { | ||
| private static Object topAggsFieldOrScript(AggregateFunction af, Expression e) { |
There was a problem hiding this comment.
This can be moved under the TopHit agg class
There was a problem hiding this comment.
Consolidated into one method.
| private final ScriptTemplate scriptTemplate; | ||
|
|
||
| Agg(String id, String fieldName) { | ||
| Agg(String id, Object fieldOrScript) { |
There was a problem hiding this comment.
Thinking a bit more about this, a better solution might be creating a simple DTO specific for Agg that can hold either a String or a ScriptTemplate.
That one would handle the hashCode, equals, etc.
Further more maybe extend it to accept behavior by accepting a ValuesSourceAggregationBuilder and internally call either field or script based on its content.
| Agg(String id, Object fieldOrScript) { | |
| Agg(String id, AggTarget target) { | |
| this.target = target; | |
| } | |
| ... | |
| class AggTarget { | |
| private final String fieldName; | |
| private final ScriptTemplate script; | |
| ... | |
| ValuesSourceAggregationBuilder with(ValuesSourceAggregationBuilder aggBuilder) { | |
| if (field != null) { | |
| aggBuilder.field(field); | |
| } | |
| else { | |
| aggBuilder.script(scriptTemplate.asScript()); | |
| } | |
| } | |
| } |
This would improve LeafAgg which would improve toBuilder and call AggTarget so that subclasses would only have to create the ValueSourceAggregationBuilder. In fact we could just declare them as a method reference for a string:
public abstract class LeafAgg extends Agg {
LeafAgg(String id, AggTarget target) {
super(id, target);
}
ValuesSourceAggregationBuilder builder() {
return aggBuilder.apply(id()).with(aggTarget);
}
protected abstract Function<String, ValuesSourceAggregationBuilder> aggBuilder();so that subclasses would now look like:
public class CardinalityAgg extends LeafAgg {
public CardinalityAgg(String id, AggTarget aggTarget) {
super(id, aggTarget);
}
@Override
ValuesSourceAggregationBuilder aggBuilder() {
return AggregationBuilders::cardinality;
}
}|
|
||
|
|
||
| // aggregates with scalars | ||
| aggregateFunctionsWithScalars |
There was a problem hiding this comment.
I have already:
I was thinking of a test with the agg declared just inside the order and not referencing an already declared aggregation.
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(id, fieldName, scriptTemplate); | ||
| return Objects.hash(id) + target.hashCode(); |
costin
left a comment
There was a problem hiding this comment.
AggSource conveys better the intent than AggTarget.
| abstract AggregationBuilder toBuilder(); | ||
|
|
||
| @SuppressWarnings("rawtypes") | ||
| protected ValuesSourceAggregationBuilder addFieldOrScript(ValuesSourceAggregationBuilder builder) { |
There was a problem hiding this comment.
The name is leaking implementation details - why not addTarget()?
There was a problem hiding this comment.
Yep, forgot to rename that, thx.
astefan
left a comment
There was a problem hiding this comment.
LGTM. This PR will make a lot of users happy.
Left few minor comments.
| not allowed and an error is returned: | ||
| [source, sql] | ||
| --------------------------------------- | ||
| SELECT KURTOSIS(salary), gender FROM emp GROUP BY gender |
There was a problem hiding this comment.
Reading the NOTE I would have expected to see "KURTOSIS ... on top of scalar functions" but I'm seeing KURTOSIS applied on a field.
|
|
||
| aggregatesWithScalarsAndGroupByOrderByAggWithoutProjection | ||
| schema::gender:s | ||
| SELECT gender FROM test_emp GROUP BY gender ORDER BY MAX(salary % 100) DESC; |
There was a problem hiding this comment.
Please, add MAX(salary % 100) to the result.
There was a problem hiding this comment.
This is a special case, requested by @costin, ORDER BY without having it in the projection.
| 18601.965319409886|3.4461553130896095E10|3.460331137445282E8 |null | ||
| 17811.071545718776|1.2151168881502939E11|3.1723426960671306E8|F | ||
| 15904.093950318531|1.699198993070239E11 |2.529402043805585E8 |M | ||
| ; |
There was a problem hiding this comment.
I like the complexities of these tests, but the queries are too complex for me to assess that the example is relevant :-). I trust that you checked the numbers against the data and made sure everything is ok.
There was a problem hiding this comment.
I've tested all the queries against PostgreSQL and the results are not always the same but very similar (double-rounding diffs).
| return new MatrixStatsAgg(id, singletonList(field(m, m.field()))); | ||
| } | ||
| throw new SqlIllegalArgumentException( | ||
| "Cannot use scalar functions or operators: [{}] in matrix stats " + "aggregate functions [KURTOSIS] and [SKEWNESS]", |
There was a problem hiding this comment.
Can this error message be improved?
- why is it mentioning matrix stats, isn't this an implementation detail?
- not a biggie and can be left as is, but can the message be more targeted to the exact aggregate function? (mention only KURTOSIS or SKEWNESS)
There was a problem hiding this comment.
- why is it mentioning matrix stats, isn't this an implementation detail?
Yep, I can remove that.
- not a biggie and can be left as is, but can the message be more targeted to the exact aggregate function? (mention only KURTOSIS or SKEWNESS)
Not really, I've tried it, because we don't have the original function anymore here, only MatrixStats. I was considering checking that earlier on the functions themselves, but then resolveType() didn't seem to me a good place to add such a verification, since it's regarding the DataType and not if the arg is field or function.
There was a problem hiding this comment.
This is better handled in the verifier by checking that Kurtosis & co (namely MatrixStatsEnclosed) don't have a scalar expression as argument.
Further more I would add an issue to extend the aggregation to support scripts not just fields.
| Agg(String id, AggTarget target) { | ||
| this.id = id; | ||
| this.fieldName = fieldName; | ||
| Objects.requireNonNull(target, "AggTarget must not be null"); |
There was a problem hiding this comment.
You could move this call as the first in the constructor, no?
| @@ -397,6 +397,16 @@ https://en.wikipedia.org/wiki/Kurtosis[Quantify] the shape of the distribution o | |||
| include-tagged::{sql-specs}/docs/docs.csv-spec[aggKurtosis] | |||
There was a problem hiding this comment.
I think it's worth adding examples in the docs (either a dedicated one or for most aggregations) that shows aggregations alongside a scalar function as it might not be obvious, especially for existing users.
This would also address @astefan point of having a test that can be easily be verified.
Avoid repetition of the aggregation builder setup Relates elastic#55241
Implement the use of scalar functions inside aggregate functions. This allows for complex expressions inside aggregations, with or without GROUBY as well as with or without a HAVING clause. e.g.: ``` SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b FROM test GROUP BY b HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5 ``` Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as this is currently not implemented on the ElasticSearch side. Fixes: #29980 Fixes: #36865 Fixes: #37271 (cherry picked from commit 506d1be)
Avoid repetition of the aggregation builder setup Relates #55241
Implement the use of scalar functions inside aggregate functions.
This allows for complex expressions inside aggregations, with or without
GROUBY as well as with or without a HAVING clause. e.g.:
Scalar functions are still not allowed for
KURTOSISandSKEWNESSasthis is currently not implemented on the Elasticsearch side.
Fixes: #29980
Fixes: #36865
Fixes: #37271