Skip to content

Commit a212420

Browse files
authored
ESQL: Document BUCKET as a grouping function (#107864)
This adds the documentation for BUCKET as a grouping function and the addition of the "direct" invocation mode providing a span (in addition to the auto mode).
1 parent 4e7a833 commit a212420

9 files changed

Lines changed: 145 additions & 45 deletions

File tree

docs/reference/esql/esql-functions-operators.asciidoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@ The reference documentation is divided into the following categories:
1616
include::functions/aggregation-functions.asciidoc[tag=agg_list]
1717
====
1818

19+
.*Grouping functions*
20+
[%collapsible]
21+
====
22+
include::functions/grouping-functions.asciidoc[tag=group_list]
23+
====
24+
1925
.*Math functions*
2026
[%collapsible]
2127
====
@@ -68,6 +74,7 @@ include::functions/operators.asciidoc[tag=op_list]
6874
====
6975

7076
include::functions/aggregation-functions.asciidoc[]
77+
include::functions/grouping-functions.asciidoc[]
7178
include::functions/math-functions.asciidoc[]
7279
include::functions/string-functions.asciidoc[]
7380
include::functions/date-time-functions.asciidoc[]

docs/reference/esql/esql-get-started.asciidoc

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -244,13 +244,6 @@ To track statistics over time, {esql} enables you to create histograms using the
244244
and returns a value for each row that corresponds to the resulting bucket the
245245
row falls into.
246246

247-
For example, to create hourly buckets for the data on October 23rd:
248-
249-
[source,esql]
250-
----
251-
include::{esql-specs}/bucket.csv-spec[tag=gs-bucket]
252-
----
253-
254247
Combine `BUCKET` with <<esql-stats-by>> to create a histogram. For example,
255248
to count the number of events per hour:
256249

docs/reference/esql/functions/aggregation-functions.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<titleabbrev>Aggregate functions</titleabbrev>
66
++++
77

8-
The <<esql-stats-by>> function supports these aggregate functions:
8+
The <<esql-stats-by>> command supports these aggregate functions:
99

1010
// tag::agg_list[]
1111
* <<esql-agg-avg>>

docs/reference/esql/functions/date-time-functions.asciidoc

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
{esql} supports these date-time functions:
99

1010
// tag::date_list[]
11-
* <<esql-bucket>>
1211
* <<esql-date_diff>>
1312
* <<esql-date_extract>>
1413
* <<esql-date_format>>
@@ -17,7 +16,6 @@
1716
* <<esql-now>>
1817
// end::date_list[]
1918

20-
include::layout/bucket.asciidoc[]
2119
include::layout/date_diff.asciidoc[]
2220
include::layout/date_extract.asciidoc[]
2321
include::layout/date_format.asciidoc[]

docs/reference/esql/functions/examples/bucket.asciidoc

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
*Examples*
44

5+
`BUCKET` can work in two modes: one in which the size of the bucket is computed
6+
based on a buckets count recommendation (four parameters) and a range, and
7+
another in which the bucket size is provided directly (two parameters).
8+
59
Using a target number of buckets, a start of a range, and an end of a range,
610
`BUCKET` picks an appropriate bucket size to generate the target number of buckets or fewer.
711
For example, asking for at most 20 buckets over a year results in monthly buckets:
@@ -17,7 +21,7 @@ include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonth-result]
1721
The goal isn't to provide *exactly* the target number of buckets,
1822
it's to pick a range that people are comfortable with that provides at most the target number of buckets.
1923

20-
Combine `BUCKET` with <<esql-stats-by>> to create a histogram:
24+
Combine `BUCKET` with an <<esql-agg-functions,aggregation>> to create a histogram:
2125
[source.merge.styled,esql]
2226
----
2327
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonthlyHistogram]
@@ -28,7 +32,7 @@ include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonthlyHistogram-result]
2832
|===
2933

3034
NOTE: `BUCKET` does not create buckets that don't match any documents.
31-
+ "That's why this example is missing `1985-03-01` and other dates.
35+
That's why this example is missing `1985-03-01` and other dates.
3236

3337
Asking for more buckets can result in a smaller range.
3438
For example, asking for at most 100 buckets in a year results in weekly buckets:
@@ -45,6 +49,20 @@ NOTE: `BUCKET` does not filter any rows. It only uses the provided range to pick
4549
For rows with a value outside of the range, it returns a bucket value that corresponds to a bucket outside the range.
4650
Combine`BUCKET` with <<esql-where>> to filter rows.
4751

52+
If the desired bucket size is known in advance, simply provide it as the second
53+
argument, leaving the range out:
54+
[source.merge.styled,esql]
55+
----
56+
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogramWithSpan]
57+
----
58+
[%header.monospaced.styled,format=dsv,separator=|]
59+
|===
60+
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogramWithSpan-result]
61+
|===
62+
63+
NOTE: When providing the bucket size as the second parameter, it must be a time
64+
duration or date period.
65+
4866
`BUCKET` can also operate on numeric fields. For example, to create a salary histogram:
4967
[source.merge.styled,esql]
5068
----
@@ -58,6 +76,20 @@ include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumeric-result]
5876
Unlike the earlier example that intentionally filters on a date range, you rarely want to filter on a numeric range.
5977
You have to find the `min` and `max` separately. {esql} doesn't yet have an easy way to do that automatically.
6078

79+
The range can be omitted if the desired bucket size is known in advance. Simply
80+
provide it as the second argument:
81+
[source.merge.styled,esql]
82+
----
83+
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumericWithSpan]
84+
----
85+
[%header.monospaced.styled,format=dsv,separator=|]
86+
|===
87+
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumericWithSpan-result]
88+
|===
89+
90+
NOTE: When providing the bucket size as the second parameter, it must be
91+
of a floating point type.
92+
6193
Create hourly buckets for the last 24 hours, and calculate the number of events per hour:
6294
[source.merge.styled,esql]
6395
----
@@ -77,3 +109,15 @@ include::{esql-specs}/bucket.csv-spec[tag=bucket_in_agg]
77109
include::{esql-specs}/bucket.csv-spec[tag=bucket_in_agg-result]
78110
|===
79111

112+
`BUCKET` may be used in both the aggregating and grouping part of the
113+
<<esql-stats-by, STATS ... BY ...>> command provided that in the aggregating
114+
part the function is referenced by an alias defined in the
115+
grouping part, or that it is invoked with the exact same expression:
116+
[source.merge.styled,esql]
117+
----
118+
include::{esql-specs}/bucket.csv-spec[tag=reuseGroupingFunctionWithExpression]
119+
----
120+
[%header.monospaced.styled,format=dsv,separator=|]
121+
|===
122+
include::{esql-specs}/bucket.csv-spec[tag=reuseGroupingFunctionWithExpression-result]
123+
|===
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[[esql-group-functions]]
2+
==== {esql} grouping functions
3+
4+
++++
5+
<titleabbrev>Grouping functions</titleabbrev>
6+
++++
7+
8+
The <<esql-stats-by>> command supports these grouping functions:
9+
10+
// tag::group_list[]
11+
* <<esql-bucket>>
12+
// end::group_list[]
13+
14+
include::layout/bucket.asciidoc[]

docs/reference/esql/functions/kibana/definition/bucket.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -939,7 +939,9 @@
939939
"FROM employees\n| WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\"\n| STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\")\n| SORT hire_date",
940940
"FROM employees\n| WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\"\n| STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\")\n| SORT month",
941941
"FROM employees\n| WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\"\n| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\")\n| SORT week",
942+
"FROM employees\n| WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\"\n| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week)\n| SORT week",
942943
"FROM employees\n| STATS COUNT(*) by bs = BUCKET(salary, 20, 25324, 74999)\n| SORT bs",
944+
"FROM employees\n| WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\"\n| STATS c = COUNT(1) BY b = BUCKET(salary, 5000.)\n| SORT b",
943945
"FROM sample_data \n| WHERE @timestamp >= NOW() - 1 day and @timestamp < NOW()\n| STATS COUNT(*) BY bucket = BUCKET(@timestamp, 25, NOW() - 1 day, NOW())",
944946
"FROM employees\n| WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\"\n| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\")\n| SORT bucket"
945947
]

x-pack/plugin/esql/qa/testFixtures/src/main/resources/bucket.csv-spec

Lines changed: 59 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -175,13 +175,13 @@ FROM employees
175175
;
176176

177177
//tag::docsBucketMonthlyHistogram-result[]
178-
hires_per_month:long | month:date
179-
2 |1985-02-01T00:00:00.000Z
180-
1 |1985-05-01T00:00:00.000Z
181-
1 |1985-07-01T00:00:00.000Z
182-
1 |1985-09-01T00:00:00.000Z
183-
2 |1985-10-01T00:00:00.000Z
184-
4 |1985-11-01T00:00:00.000Z
178+
hires_per_month:long | month:date
179+
2 |1985-02-01T00:00:00.000Z
180+
1 |1985-05-01T00:00:00.000Z
181+
1 |1985-07-01T00:00:00.000Z
182+
1 |1985-09-01T00:00:00.000Z
183+
2 |1985-10-01T00:00:00.000Z
184+
4 |1985-11-01T00:00:00.000Z
185185
//end::docsBucketMonthlyHistogram-result[]
186186
;
187187

@@ -196,15 +196,36 @@ FROM employees
196196

197197
//tag::docsBucketWeeklyHistogram-result[]
198198
hires_per_week:long | week:date
199-
2 |1985-02-18T00:00:00.000Z
200-
1 |1985-05-13T00:00:00.000Z
201-
1 |1985-07-08T00:00:00.000Z
202-
1 |1985-09-16T00:00:00.000Z
203-
2 |1985-10-14T00:00:00.000Z
204-
4 |1985-11-18T00:00:00.000Z
199+
2 |1985-02-18T00:00:00.000Z
200+
1 |1985-05-13T00:00:00.000Z
201+
1 |1985-07-08T00:00:00.000Z
202+
1 |1985-09-16T00:00:00.000Z
203+
2 |1985-10-14T00:00:00.000Z
204+
4 |1985-11-18T00:00:00.000Z
205205
//end::docsBucketWeeklyHistogram-result[]
206206
;
207207

208+
// bucketing in span mode (identical results to above)
209+
docsBucketWeeklyHistogramWithSpan#[skip:-8.13.99, reason:BUCKET renamed in 8.14]
210+
//tag::docsBucketWeeklyHistogramWithSpan[]
211+
FROM employees
212+
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
213+
| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week)
214+
| SORT week
215+
//end::docsBucketWeeklyHistogramWithSpan[]
216+
;
217+
218+
//tag::docsBucketWeeklyHistogramWithSpan-result[]
219+
hires_per_week:long | week:date
220+
2 |1985-02-18T00:00:00.000Z
221+
1 |1985-05-13T00:00:00.000Z
222+
1 |1985-07-08T00:00:00.000Z
223+
1 |1985-09-16T00:00:00.000Z
224+
2 |1985-10-14T00:00:00.000Z
225+
4 |1985-11-18T00:00:00.000Z
226+
//end::docsBucketWeeklyHistogramWithSpan-result[]
227+
;
228+
208229
docsBucketLast24hr#[skip:-8.13.99, reason:BUCKET renamed in 8.14]
209230
//tag::docsBucketLast24hr[]
210231
FROM sample_data
@@ -218,17 +239,6 @@ FROM sample_data
218239
//end::docsBucketLast24hr-result[]
219240
;
220241

221-
docsGettingStartedBucket#[skip:-8.13.99, reason:BUCKET renamed in 8.14]
222-
// tag::gs-bucket[]
223-
FROM sample_data
224-
| STATS BY bucket = BUCKET(@timestamp, 24, "2023-10-23T00:00:00Z", NOW())
225-
// end::gs-bucket[]
226-
| LIMIT 0
227-
;
228-
229-
bucket:date
230-
;
231-
232242
docsGettingStartedBucketStatsBy#[skip:-8.13.99, reason:BUCKET renamed in 8.14]
233243
// tag::gs-bucket-stats-by[]
234244
FROM sample_data
@@ -352,12 +362,15 @@ FROM employees
352362

353363
// bucketing in span mode (identical results to above)
354364
bucketNumericWithSpan#[skip:-8.13.99, reason:BUCKET extended in 8.14]
365+
//tag::docsBucketNumericWithSpan[]
355366
FROM employees
356367
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
357368
| STATS c = COUNT(1) BY b = BUCKET(salary, 5000.)
358369
| SORT b
370+
//end::docsBucketNumericWithSpan[]
359371
;
360372

373+
//tag::docsBucketNumericWithSpan-result[]
361374
c:long | b:double
362375
1 |25000.0
363376
1 |30000.0
@@ -368,6 +381,7 @@ FROM employees
368381
1 |60000.0
369382
1 |65000.0
370383
1 |70000.0
384+
//end::docsBucketNumericWithSpan-result[]
371385
;
372386

373387
bucketNumericMixedTypes#[skip:-8.13.99, reason:BUCKET extended in 8.14]
@@ -439,14 +453,28 @@ FROM employees
439453
;
440454

441455
reuseGroupingFunctionWithExpression#[skip:-8.13.99, reason:BUCKET renamed in 8.14]
456+
//tag::reuseGroupingFunctionWithExpression[]
442457
FROM employees
443-
| STATS sum = BUCKET(salary % 2 + 13, 1.) + 1 BY bucket = BUCKET(salary % 2 + 13, 1.)
444-
| SORT sum
445-
;
446-
447-
sum:double | bucket:double
448-
14.0 |13.0
449-
15.0 |14.0
458+
| STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.)
459+
| SORT b1, b2
460+
| KEEP s1, b1, s2, b2
461+
//end::reuseGroupingFunctionWithExpression[]
462+
;
463+
464+
//tag::reuseGroupingFunctionWithExpression-result[]
465+
s1:double | b1:double | s2:double | b2:double
466+
351.0 |350.0 |1002.0 |1000.0
467+
401.0 |400.0 |1002.0 |1000.0
468+
451.0 |450.0 |1002.0 |1000.0
469+
501.0 |500.0 |1002.0 |1000.0
470+
551.0 |550.0 |1002.0 |1000.0
471+
601.0 |600.0 |1002.0 |1000.0
472+
601.0 |600.0 |1052.0 |1050.0
473+
651.0 |650.0 |1052.0 |1050.0
474+
701.0 |700.0 |1052.0 |1050.0
475+
751.0 |750.0 |1052.0 |1050.0
476+
801.0 |800.0 |1052.0 |1050.0
477+
//end::reuseGroupingFunctionWithExpression-result[]
450478
;
451479

452480
reuseGroupingFunctionWithinAggs#[skip:-8.13.99, reason:BUCKET renamed in 8.14]

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/grouping/Bucket.java

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ public class Bucket extends GroupingFunction implements Validatable, TwoOptional
9292
examples = {
9393
@Example(
9494
description = """
95+
`BUCKET` can work in two modes: one in which the size of the bucket is computed
96+
based on a buckets count recommendation (four parameters) and a range and
97+
another in which the bucket size is provided directly (two parameters).
98+
9599
Using a target number of buckets, a start of a range, and an end of a range,
96100
`BUCKET` picks an appropriate bucket size to generate the target number of buckets or fewer.
97101
For example, asking for at most 20 buckets over a year results in monthly buckets:""",
@@ -102,12 +106,12 @@ public class Bucket extends GroupingFunction implements Validatable, TwoOptional
102106
it's to pick a range that people are comfortable with that provides at most the target number of buckets."""
103107
),
104108
@Example(
105-
description = "Combine `BUCKET` with <<esql-stats-by>> to create a histogram:",
109+
description = "Combine `BUCKET` with an <<esql-agg-functions,aggregation>> to create a histogram:",
106110
file = "bucket",
107111
tag = "docsBucketMonthlyHistogram",
108112
explanation = """
109113
NOTE: `BUCKET` does not create buckets that don't match any documents.
110-
+ "That's why this example is missing `1985-03-01` and other dates."""
114+
That's why this example is missing `1985-03-01` and other dates."""
111115
),
112116
@Example(
113117
description = """
@@ -120,6 +124,11 @@ public class Bucket extends GroupingFunction implements Validatable, TwoOptional
120124
For rows with a value outside of the range, it returns a bucket value that corresponds to a bucket outside the range.
121125
Combine`BUCKET` with <<esql-where>> to filter rows."""
122126
),
127+
@Example(description = """
128+
If the desired bucket size is known in advance, simply provide it as the second
129+
argument, leaving the range out:""", file = "bucket", tag = "docsBucketWeeklyHistogramWithSpan", explanation = """
130+
NOTE: When providing the bucket size as the second parameter, its type must be
131+
of a time duration or date period type."""),
123132
@Example(
124133
description = "`BUCKET` can also operate on numeric fields. For example, to create a salary histogram:",
125134
file = "bucket",
@@ -128,6 +137,11 @@ public class Bucket extends GroupingFunction implements Validatable, TwoOptional
128137
Unlike the earlier example that intentionally filters on a date range, you rarely want to filter on a numeric range.
129138
You have to find the `min` and `max` separately. {esql} doesn't yet have an easy way to do that automatically."""
130139
),
140+
@Example(description = """
141+
If the desired bucket size is known in advance, simply provide it as the second
142+
argument, leaving the range out:""", file = "bucket", tag = "docsBucketNumericWithSpan", explanation = """
143+
NOTE: When providing the bucket size as the second parameter, its type must be
144+
of a floating type."""),
131145
@Example(
132146
description = "Create hourly buckets for the last 24 hours, and calculate the number of events per hour:",
133147
file = "bucket",

0 commit comments

Comments
 (0)