Skip to content

Commit 506d1be

Browse files
authored
SQL: Implement scripting inside aggs (elastic#55241)
Implement the use of scalar functions inside aggregate functions. This allows for complex expressions inside aggregations, with or without GROUBY as well as with or without a HAVING clause. e.g.: ``` SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b FROM test GROUP BY b HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5 ``` Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as this is currently not implemented on the ElasticSearch side. Fixes: elastic#29980 Fixes: elastic#36865 Fixes: elastic#37271
1 parent feeb0c6 commit 506d1be

30 files changed

Lines changed: 904 additions & 187 deletions

docs/reference/sql/functions/aggs.asciidoc

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ AVG(numeric_field) <1>
3232
include-tagged::{sql-specs}/docs/docs.csv-spec[aggAvg]
3333
--------------------------------------------------
3434

35+
["source","sql",subs="attributes,macros"]
36+
--------------------------------------------------
37+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggAvgScalars]
38+
--------------------------------------------------
39+
3540
[[sql-functions-aggs-count]]
3641
==== `COUNT`
3742

@@ -82,6 +87,10 @@ COUNT(ALL field_name) <1>
8287
include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountAll]
8388
--------------------------------------------------
8489

90+
["source","sql",subs="attributes,macros"]
91+
--------------------------------------------------
92+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountAllScalars]
93+
--------------------------------------------------
8594

8695
[[sql-functions-aggs-count-distinct]]
8796
==== `COUNT(DISTINCT)`
@@ -105,6 +114,11 @@ COUNT(DISTINCT field_name) <1>
105114
include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountDistinct]
106115
--------------------------------------------------
107116

117+
["source","sql",subs="attributes,macros"]
118+
--------------------------------------------------
119+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggCountDistinctScalars]
120+
--------------------------------------------------
121+
108122
[[sql-functions-aggs-first]]
109123
==== `FIRST/FIRST_VALUE`
110124

@@ -194,6 +208,11 @@ include-tagged::{sql-specs}/docs/docs.csv-spec[firstWithTwoArgsAndGroupBy]
194208
include-tagged::{sql-specs}/docs/docs.csv-spec[firstValueWithTwoArgsAndGroupBy]
195209
--------------------------------------------------------------------------
196210

211+
["source","sql",subs="attributes,macros"]
212+
--------------------------------------------------------------------------
213+
include-tagged::{sql-specs}/docs/docs.csv-spec[firstValueWithTwoArgsAndGroupByScalars]
214+
--------------------------------------------------------------------------
215+
197216
[NOTE]
198217
`FIRST` cannot be used in a HAVING clause.
199218
[NOTE]
@@ -289,6 +308,11 @@ include-tagged::{sql-specs}/docs/docs.csv-spec[lastWithTwoArgsAndGroupBy]
289308
include-tagged::{sql-specs}/docs/docs.csv-spec[lastValueWithTwoArgsAndGroupBy]
290309
-------------------------------------------------------------------------
291310

311+
["source","sql",subs="attributes,macros"]
312+
-------------------------------------------------------------------------
313+
include-tagged::{sql-specs}/docs/docs.csv-spec[lastValueWithTwoArgsAndGroupByScalars]
314+
-------------------------------------------------------------------------
315+
292316
[NOTE]
293317
`LAST` cannot be used in `HAVING` clause.
294318
[NOTE]
@@ -317,6 +341,11 @@ MAX(field_name) <1>
317341
include-tagged::{sql-specs}/docs/docs.csv-spec[aggMax]
318342
--------------------------------------------------
319343

344+
["source","sql",subs="attributes,macros"]
345+
--------------------------------------------------
346+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggMaxScalars]
347+
--------------------------------------------------
348+
320349
[NOTE]
321350
`MAX` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
322351
<<sql-functions-aggs-last>> and therefore, it cannot be used in `HAVING` clause.
@@ -369,6 +398,11 @@ SUM(field_name) <1>
369398
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSum]
370399
--------------------------------------------------
371400

401+
["source","sql",subs="attributes,macros"]
402+
--------------------------------------------------
403+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSumScalars]
404+
--------------------------------------------------
405+
372406
[[sql-functions-aggs-statistics]]
373407
[float]
374408
=== Statistics
@@ -397,6 +431,16 @@ https://en.wikipedia.org/wiki/Kurtosis[Quantify] the shape of the distribution o
397431
include-tagged::{sql-specs}/docs/docs.csv-spec[aggKurtosis]
398432
--------------------------------------------------
399433

434+
[NOTE]
435+
====
436+
`KURTOSIS` cannot be used on top of scalar functions or operators but only directly on a field. So, for example,
437+
the following is not allowed and an error is returned:
438+
[source, sql]
439+
---------------------------------------
440+
SELECT KURTOSIS(salary / 12.0), gender FROM emp GROUP BY gender
441+
---------------------------------------
442+
====
443+
400444
[[sql-functions-aggs-mad]]
401445
==== `MAD`
402446

@@ -421,6 +465,11 @@ https://en.wikipedia.org/wiki/Median_absolute_deviation[Measure] the variability
421465
include-tagged::{sql-specs}/docs/docs.csv-spec[aggMad]
422466
--------------------------------------------------
423467

468+
["source","sql",subs="attributes,macros"]
469+
--------------------------------------------------
470+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggMadScalars]
471+
--------------------------------------------------
472+
424473
[[sql-functions-aggs-percentile]]
425474
==== `PERCENTILE`
426475

@@ -449,6 +498,11 @@ of input values in the field `field_name`.
449498
include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentile]
450499
--------------------------------------------------
451500

501+
["source","sql",subs="attributes,macros"]
502+
--------------------------------------------------
503+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentileScalars]
504+
--------------------------------------------------
505+
452506
[[sql-functions-aggs-percentile-rank]]
453507
==== `PERCENTILE_RANK`
454508

@@ -477,6 +531,11 @@ of input values in the field `field_name`.
477531
include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentileRank]
478532
--------------------------------------------------
479533

534+
["source","sql",subs="attributes,macros"]
535+
--------------------------------------------------
536+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggPercentileRankScalars]
537+
--------------------------------------------------
538+
480539
[[sql-functions-aggs-skewness]]
481540
==== `SKEWNESS`
482541

@@ -501,6 +560,16 @@ https://en.wikipedia.org/wiki/Skewness[Quantify] the asymmetric distribution of
501560
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSkewness]
502561
--------------------------------------------------
503562

563+
[NOTE]
564+
====
565+
`SKEWNESS` cannot be used on top of scalar functions but only directly on a field. So, for example, the following is
566+
not allowed and an error is returned:
567+
[source, sql]
568+
---------------------------------------
569+
SELECT SKEWNESS(ROUND(salary / 12.0, 2), gender FROM emp GROUP BY gender
570+
---------------------------------------
571+
====
572+
504573
[[sql-functions-aggs-stddev-pop]]
505574
==== `STDDEV_POP`
506575

@@ -525,6 +594,11 @@ Returns the https://en.wikipedia.org/wiki/Standard_deviations[population standar
525594
include-tagged::{sql-specs}/docs/docs.csv-spec[aggStddevPop]
526595
--------------------------------------------------
527596

597+
["source","sql",subs="attributes,macros"]
598+
--------------------------------------------------
599+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggStddevPopScalars]
600+
--------------------------------------------------
601+
528602
[[sql-functions-aggs-sum-squares]]
529603
==== `SUM_OF_SQUARES`
530604

@@ -549,6 +623,11 @@ Returns the https://en.wikipedia.org/wiki/Total_sum_of_squares[sum of squares] o
549623
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSumOfSquares]
550624
--------------------------------------------------
551625

626+
["source","sql",subs="attributes,macros"]
627+
--------------------------------------------------
628+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggSumOfSquaresScalars]
629+
--------------------------------------------------
630+
552631
[[sql-functions-aggs-var-pop]]
553632
==== `VAR_POP`
554633

@@ -572,3 +651,9 @@ Returns the https://en.wikipedia.org/wiki/Variance[population variance] of input
572651
--------------------------------------------------
573652
include-tagged::{sql-specs}/docs/docs.csv-spec[aggVarPop]
574653
--------------------------------------------------
654+
655+
656+
["source","sql",subs="attributes,macros"]
657+
--------------------------------------------------
658+
include-tagged::{sql-specs}/docs/docs.csv-spec[aggVarPopScalars]
659+
--------------------------------------------------

docs/reference/sql/limitations.asciidoc

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -129,12 +129,6 @@ SELECT age, ROUND(AVG(salary)) AS avg FROM test GROUP BY age ORDER BY avg;
129129
SELECT age, MAX(salary) - MIN(salary) AS diff FROM test GROUP BY age ORDER BY diff;
130130
--------------------------------------------------
131131

132-
[float]
133-
=== Using aggregation functions on top of scalar functions
134-
135-
Aggregation functions like <<sql-functions-aggs-min,`MIN`>>, <<sql-functions-aggs-max,`MAX`>>, etc. can only be used
136-
directly on fields, and so queries like `SELECT MAX(abs(age)) FROM test` are not possible.
137-
138132
[float]
139133
=== Using a sub-select
140134

x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -910,3 +910,196 @@ SELECT gender, MAD(salary) AS mad FROM test_emp GROUP BY gender HAVING mad > 100
910910
null |10789.0
911911
F |12719.0
912912
;
913+
914+
915+
// aggregates with scalars
916+
aggregateFunctionsWithScalars
917+
SELECT MAX(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "max",
918+
MIN(CASE WHEN (salary - 20) > 50000 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "min",
919+
AVG(cos(salary * 1.2) + 100 * (salary / 5)) AS "avg",
920+
SUM(-salary / 0.765 + sin((salary + 12345) / 12)) AS "sum",
921+
MAD(abs(salary / -0.813) / 2 + (12345 * (salary % 10))) AS "mad"
922+
FROM test_emp;
923+
924+
max | min | avg | sum | mad
925+
------------------+---------------+-----------------+------------------+-----------------
926+
155409.30000000002|23532.72 |964937.9295477575|-6307004.517507723|30811.76199261993
927+
;
928+
929+
countWithScalars
930+
schema::cnt1:l|cnt2:l
931+
SELECT count(DISTINCT CASE WHEN (languages - 1) > 3 THEN (languages + 3) * 1.2 ELSE (languages - 1) * 2.7 END) AS "cnt1",
932+
count(CASE WHEN (languages - 2) > 2 THEN (languages + 5) * 1.2 ELSE ((languages / 0.87) - 11) * 2.7 END) AS "cnt2"
933+
FROM test_emp;
934+
935+
cnt1 | cnt2
936+
----------+-------
937+
5 | 90
938+
;
939+
940+
aggregateFunctionsWithScalarsAndGroupBy
941+
schema::max:d|min:d|avg:d|sum:d|mad:d|gender:s
942+
SELECT MAX(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "max",
943+
MIN(CASE WHEN (salary - 20) > 50000 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "min",
944+
AVG(cos(salary * 1.2) + 100 * (salary / 5)) AS "avg",
945+
SUM(-salary / 0.765 + sin((salary + 12345) / 12)) AS "sum",
946+
MAD(abs(salary / -0.813) / 2 + (12345 * (salary % 10))) AS "mad",
947+
gender
948+
FROM test_emp GROUP BY gender ORDER BY gender;
949+
950+
max | min | avg | sum | mad | gender
951+
------------------+---------------+------------------+-------------------+-----------------+---------------
952+
132335.1 |23532.72 |975179.5463883684 |-637388.2516376646 |33398.4963099631 |null
953+
155409.30000000002|24139.08 |1009778.6217005679|-2178038.0602625553|24031.90651906518|F
954+
151745.40000000002|24110.25 |937180.7539433916 |-3491578.2056075027|32956.9126691267 |M
955+
;
956+
957+
countWithScalarsAndGroupBy
958+
schema::cnt1:l|cnt2:l|gender:s
959+
SELECT count(DISTINCT CASE WHEN (languages - 1) > 3 THEN (languages + 3) * 1.2 ELSE (languages - 1) * 2.7 END) AS "cnt1",
960+
count(CASE WHEN (languages - 2) > 2 THEN (languages + 5) * 1.2 ELSE ((languages / 0.87) - 11) * 2.7 END) AS "cnt2",
961+
gender
962+
FROM test_emp GROUP BY gender ORDER BY gender;
963+
964+
cnt1 | cnt2 | gender
965+
---------------+---------------+---------------
966+
4 |10 |null
967+
5 |30 |F
968+
5 |50 |M
969+
;
970+
971+
aggregatesWithScalarsAndGroupByOrderByAgg
972+
schema::max:d|gender:s
973+
SELECT MAX(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "max",
974+
gender
975+
FROM test_emp GROUP BY gender ORDER BY max DESC;
976+
977+
max | gender
978+
------------------+---------------
979+
155409.30000000002|F
980+
151745.40000000002|M
981+
132335.1 |null
982+
;
983+
984+
aggregatesWithScalarsAndGroupByOrderByAggWithoutProjection
985+
schema::gender:s
986+
SELECT gender FROM test_emp GROUP BY gender ORDER BY MAX(salary % 100) DESC;
987+
988+
gender
989+
---------------
990+
M
991+
null
992+
F
993+
;
994+
995+
topHitsWithScalars
996+
schema::first:s|last:s|gender:s
997+
SELECT FIRST(concat('aa_', substring(first_name, 3, 10)), birth_date) AS first,
998+
LAST(concat('bb_', substring(last_name, 4, 8)), birth_date) AS last,
999+
gender
1000+
FROM test_emp GROUP BY gender ORDER By gender;
1001+
1002+
first | last | gender
1003+
---------------+---------------+---------------
1004+
aa_llian |bb_kki |null
1005+
aa_mant |bb_zuma |F
1006+
aa_mzi |bb_ton |M
1007+
;
1008+
1009+
aggregateFunctionsWithScalarsAndGroupByAndHaving
1010+
schema::max:d|min:d|gender:s
1011+
SELECT MAX(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "max",
1012+
MIN(CASE WHEN (salary - 20) > 50000 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "min",
1013+
gender FROM test_emp
1014+
GROUP BY gender HAVING max > 152000 or min > 24000 ORDER BY gender;
1015+
1016+
max | min | gender
1017+
------------------+---------------+---------------
1018+
155409.30000000002|24139.08 |F
1019+
151745.40000000002|24110.25 |M
1020+
;
1021+
1022+
aggregateFunctionsWithScalarsAndGroupByAndHaving_ComplexExpressions
1023+
schema::max:d|min:d|gender:s
1024+
SELECT ABS((MAX(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) + 123) / -100) AS "max",
1025+
cos(MIN(CASE WHEN (salary - 20) > 50000 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) % 100) AS "min",
1026+
gender
1027+
FROM test_emp
1028+
GROUP BY gender HAVING (max / 10) + 10 > 165 OR ABS(min * -100) > 60 ORDER BY gender;
1029+
1030+
max | min | gender
1031+
------------------+-------------------+---------------
1032+
1555.323 |0.1887687166044111 |F
1033+
1518.6840000000002|-0.6783938504738453|M
1034+
;
1035+
1036+
aggregateFunctionsWithScalarsAndGroupByAndHaving_CombinedFields
1037+
schema::min:d|max:d|gender:s
1038+
SELECT MIN(ABS(salary * (languages / - 20.0))) AS "min",
1039+
MAX(salary / ((languages / 3.0) + 1)) AS "max",
1040+
gender
1041+
FROM test_emp
1042+
GROUP BY gender HAVING (min::long) / 120 > 12 OR ROUND(max) / 10 > 5200 ORDER BY gender;
1043+
1044+
min | max | gender
1045+
---------------+---------------+---------------
1046+
2436.75 |55287.75 |null
1047+
1401.75 |52508.25 |M
1048+
;
1049+
1050+
aggregateFunctionsWithScalarsAndGroupByAndHavingConvertedToStats
1051+
schema::max:d|min:d|gender:s
1052+
SELECT MAX(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "max",
1053+
MIN(CASE WHEN (salary - 10) > 70000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "min",
1054+
gender FROM test_emp
1055+
GROUP BY gender HAVING max > 155000 or min > 36000 ORDER BY gender;
1056+
1057+
max | min | gender
1058+
------------------+------------------+---------------
1059+
155409.30000000002|36803.700000000004|F
1060+
151745.40000000002|36720.0 |M
1061+
;
1062+
1063+
percentileAggregateFunctionsWithScalars
1064+
schema::percentile:d|percentile_rank:d|gender:s
1065+
SELECT PERCENTILE(CASE WHEN (salary / 2) > 10000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END, 80) AS "percentile",
1066+
PERCENTILE_RANK(CASE WHEN (salary - 20) > 50000 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END, 40000) AS "percentile_rank",
1067+
gender FROM test_emp
1068+
GROUP BY gender ORDER BY gender;
1069+
1070+
percentile | percentile_rank | gender
1071+
-----------------+------------------+---------------
1072+
86857.79999999999|32.69659025378865 |null
1073+
94042.92000000001|37.03569653103581 |F
1074+
87348.36 |44.337514210592246|M
1075+
;
1076+
1077+
extendedStatsAggregateFunctionsWithScalars
1078+
schema::stddev_pop:d|sum_of_squares:d|var_pop:d|gender:s
1079+
SELECT STDDEV_POP(CASE WHEN (salary / 2) > 10000 THEN (salary + 12345) * 1.2 ELSE (salary - 12345) * 2.7 END) AS "stddev_pop",
1080+
SUM_OF_SQUARES(CASE WHEN (salary - 20) > 50000 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "sum_of_squares",
1081+
VAR_POP(CASE WHEN (salary - 20) % 1000 > 200 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "var_pop",
1082+
gender FROM test_emp
1083+
GROUP BY gender ORDER BY gender;
1084+
1085+
stddev_pop | sum_of_squares | var_pop | gender
1086+
------------------+---------------------+--------------------+---------------
1087+
16752.73244172422 |3.06310583829007E10 |3.460331137445282E8 |null
1088+
17427.462400181845|1.148127725047658E11 |3.1723426960671306E8|F
1089+
15702.798665784752|1.5882243113919238E11|2.529402043805585E8 |M
1090+
;
1091+
1092+
extendedStatsAggregateFunctionsWithScalarAndSameArg
1093+
schema::stddev_pop:d|sum_of_squares:d|var_pop:d|gender:s
1094+
SELECT STDDEV_POP(CASE WHEN (salary - 20) % 1000 > 200 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "stddev_pop",
1095+
SUM_OF_SQUARES(CASE WHEN (salary - 20) % 1000 > 200 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "sum_of_squares",
1096+
VAR_POP(CASE WHEN (salary - 20) % 1000 > 200 THEN (salary * 1.2) - 1234 ELSE (salary - 20) * 0.93 END) AS "var_pop",
1097+
gender FROM test_emp
1098+
GROUP BY gender ORDER BY gender;
1099+
1100+
stddev_pop | sum_of_squares | var_pop | gender
1101+
------------------+---------------------+--------------------+---------------
1102+
18601.965319409886|3.4461553130896095E10|3.460331137445282E8 |null
1103+
17811.071545718776|1.2151168881502939E11|3.1723426960671306E8|F
1104+
15904.093950318531|1.699198993070239E11 |2.529402043805585E8 |M
1105+
;

0 commit comments

Comments
 (0)